aboutsummaryrefslogtreecommitdiffhomepage
path: root/docs/schema.html
diff options
context:
space:
mode:
Diffstat (limited to 'docs/schema.html')
-rw-r--r--docs/schema.html949
1 files changed, 949 insertions, 0 deletions
diff --git a/docs/schema.html b/docs/schema.html
new file mode 100644
index 0000000..0564f26
--- /dev/null
+++ b/docs/schema.html
@@ -0,0 +1,949 @@
1<!DOCTYPE html>
2<html>
3<head>
4 <meta http-equiv="content-type" content="text/html;charset=utf-8">
5 <title>schema.py</title>
6 <link rel="stylesheet" href="pycco.css">
7</head>
8<body>
9<div id='container'>
10 <div id="background"></div>
11 <div class='section'>
12 <div class='docs'><h1>schema.py</h1></div>
13 </div>
14 <div class='clearall'>
15 <div class='section' id='section-0'>
16 <div class='docs'>
17 <div class='octowrap'>
18 <a class='octothorpe' href='#section-0'>#</a>
19 </div>
20
21 </div>
22 <div class='code'>
23 <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">os</span>
24<span class="kn">import</span> <span class="nn">json</span>
25<span class="kn">import</span> <span class="nn">re</span>
26<span class="kn">import</span> <span class="nn">urllib.parse</span>
27<span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">OrderedDict</span>
28<span class="kn">import</span> <span class="nn">singer</span>
29<span class="kn">from</span> <span class="nn">singer</span> <span class="kn">import</span> <span class="n">metadata</span>
30<span class="kn">from</span> <span class="nn">tap_google_sheets.streams</span> <span class="kn">import</span> <span class="n">STREAMS</span>
31
32<span class="n">LOGGER</span> <span class="o">=</span> <span class="n">singer</span><span class="o">.</span><span class="n">get_logger</span><span class="p">()</span></pre></div>
33 </div>
34 </div>
35 <div class='clearall'></div>
36 <div class='section' id='section-1'>
37 <div class='docs'>
38 <div class='octowrap'>
39 <a class='octothorpe' href='#section-1'>#</a>
40 </div>
41 <p>Convert column index to column letter</p>
42 </div>
43 <div class='code'>
44 <div class="highlight"><pre><span class="k">def</span> <span class="nf">colnum_string</span><span class="p">(</span><span class="n">num</span><span class="p">):</span></pre></div>
45 </div>
46 </div>
47 <div class='clearall'></div>
48 <div class='section' id='section-2'>
49 <div class='docs'>
50 <div class='octowrap'>
51 <a class='octothorpe' href='#section-2'>#</a>
52 </div>
53
54 </div>
55 <div class='code'>
56 <div class="highlight"><pre> <span class="n">string</span> <span class="o">=</span> <span class="s2">&quot;&quot;</span>
57 <span class="k">while</span> <span class="n">num</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
58 <span class="n">num</span><span class="p">,</span> <span class="n">remainder</span> <span class="o">=</span> <span class="nb">divmod</span><span class="p">(</span><span class="n">num</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">26</span><span class="p">)</span>
59 <span class="n">string</span> <span class="o">=</span> <span class="nb">chr</span><span class="p">(</span><span class="mi">65</span> <span class="o">+</span> <span class="n">remainder</span><span class="p">)</span> <span class="o">+</span> <span class="n">string</span>
60 <span class="k">return</span> <span class="n">string</span></pre></div>
61 </div>
62 </div>
63 <div class='clearall'></div>
64 <div class='section' id='section-3'>
65 <div class='docs'>
66 <div class='octowrap'>
67 <a class='octothorpe' href='#section-3'>#</a>
68 </div>
69 <hr />
70 </div>
71 <div class='code'>
72 <div class="highlight"><pre></pre></div>
73 </div>
74 </div>
75 <div class='clearall'></div>
76 <div class='section' id='section-4'>
77 <div class='docs'>
78 <div class='octowrap'>
79 <a class='octothorpe' href='#section-4'>#</a>
80 </div>
81 <p>The goal of this function is to get the JSON schema of the sheet you pass in. Our return values here
82are <code>sheet_json_schema</code> and <code>columns</code>, an <code>OrderedDict</code> and a list respectively.</p>
83 </div>
84 <div class='code'>
85 <div class="highlight"><pre></pre></div>
86 </div>
87 </div>
88 <div class='clearall'></div>
89 <div class='section' id='section-5'>
90 <div class='docs'>
91 <div class='octowrap'>
92 <a class='octothorpe' href='#section-5'>#</a>
93 </div>
94 <p>This function is massive and we will discuss it in the following parts:</p>
95 </div>
96 <div class='code'>
97 <div class="highlight"><pre></pre></div>
98 </div>
99 </div>
100 <div class='clearall'></div>
101 <div class='section' id='section-6'>
102 <div class='docs'>
103 <div class='octowrap'>
104 <a class='octothorpe' href='#section-6'>#</a>
105 </div>
106 <ul>
107<li>Part 1</li>
108<li>Part 2<ul>
109<li>Part 2A</li>
110<li>Part 2B<ul>
111<li>Part 3</li>
112<li>Part 4</li>
113</ul>
114</li>
115</ul>
116</li>
117</ul>
118 </div>
119 <div class='code'>
120 <div class="highlight"><pre></pre></div>
121 </div>
122 </div>
123 <div class='clearall'></div>
124 <div class='section' id='section-7'>
125 <div class='docs'>
126 <div class='octowrap'>
127 <a class='octothorpe' href='#section-7'>#</a>
128 </div>
129 <p>Part 1 is just setting up constants and variables. We can skim through this part.</p>
130 </div>
131 <div class='code'>
132 <div class="highlight"><pre></pre></div>
133 </div>
134 </div>
135 <div class='clearall'></div>
136 <div class='section' id='section-8'>
137 <div class='docs'>
138 <div class='octowrap'>
139 <a class='octothorpe' href='#section-8'>#</a>
140 </div>
141 <p>Part 2 is split into two parts because it&rsquo;s a loop over the column and there&rsquo;s two ways to handle a
142column.</p>
143 </div>
144 <div class='code'>
145 <div class="highlight"><pre></pre></div>
146 </div>
147 </div>
148 <div class='clearall'></div>
149 <div class='section' id='section-9'>
150 <div class='docs'>
151 <div class='octowrap'>
152 <a class='octothorpe' href='#section-9'>#</a>
153 </div>
154 <p>We&rsquo;ll consider 2A to be the &ldquo;skip this column&rdquo; case.</p>
155 </div>
156 <div class='code'>
157 <div class="highlight"><pre></pre></div>
158 </div>
159 </div>
160 <div class='clearall'></div>
161 <div class='section' id='section-10'>
162 <div class='docs'>
163 <div class='octowrap'>
164 <a class='octothorpe' href='#section-10'>#</a>
165 </div>
166 <p>We&rsquo;ll consider 2B as the &ldquo;not skipped&rdquo; case. In which we determine a field&rsquo;s type (Part 3) and then
167use the type to decide the JSON Schema (Part 4).</p>
168 </div>
169 <div class='code'>
170 <div class="highlight"><pre></pre></div>
171 </div>
172 </div>
173 <div class='clearall'></div>
174 <div class='section' id='section-11'>
175 <div class='docs'>
176 <div class='octowrap'>
177 <a class='octothorpe' href='#section-11'>#</a>
178 </div>
179 <hr />
180<p>Create sheet_metadata_json with columns from sheet</p>
181 </div>
182 <div class='code'>
183 <div class="highlight"><pre><span class="k">def</span> <span class="nf">get_sheet_schema_columns</span><span class="p">(</span><span class="n">sheet</span><span class="p">):</span></pre></div>
184 </div>
185 </div>
186 <div class='clearall'></div>
187 <div class='section' id='section-12'>
188 <div class='docs'>
189 <div class='octowrap'>
190 <a class='octothorpe' href='#section-12'>#</a>
191 </div>
192 <p>The input to this function is shaped like</p>
193<pre><code class="language-JSON">{
194 &quot;data&quot; : [
195 {
196 &quot;rowData&quot;: [
197 {&quot;values&quot;: &lt;thing 1&gt;},
198 {&quot;values&quot;: &lt;thing 2&gt;}
199 ]
200 }
201 ]
202}
203</code></pre>
204<p>Return Values</p>
205<ul>
206<li>
207<p>columns</p>
208<ul>
209<li>A <code>column</code> that goes into <code>columns</code> is a dictionary with keys <code>"columnIndex"</code>,
210<code>"columnLetter"</code>, <code>"columnName"</code>, <code>"columnType"</code>, and <code>"columnSkipped"</code>.</li>
211</ul>
212</li>
213<li>
214<p>sheet_json_schema</p>
215<ul>
216<li>A <code>col_properties</code> that goes into <code>sheet_json_schema['properties'][column_name]</code> is the JSON
217schema of <code>column_name</code>.</li>
218</ul>
219</li>
220</ul>
221 </div>
222 <div class='code'>
223 <div class="highlight"><pre> <span class="n">sheet_title</span> <span class="o">=</span> <span class="n">sheet</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;properties&#39;</span><span class="p">,</span> <span class="p">{})</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;title&#39;</span><span class="p">)</span>
224 <span class="n">sheet_json_schema</span> <span class="o">=</span> <span class="n">OrderedDict</span><span class="p">()</span>
225 <span class="n">data</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="nb">iter</span><span class="p">(</span><span class="n">sheet</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;data&#39;</span><span class="p">,</span> <span class="p">[])),</span> <span class="p">{})</span>
226 <span class="n">row_data</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;rowData&#39;</span><span class="p">,</span> <span class="p">[])</span>
227 <span class="k">if</span> <span class="n">row_data</span> <span class="o">==</span> <span class="p">[]:</span>
228 <span class="n">LOGGER</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">&#39;SKIPPING Empty Sheet: </span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">sheet_title</span><span class="p">))</span>
229 <span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span></pre></div>
230 </div>
231 </div>
232 <div class='clearall'></div>
233 <div class='section' id='section-13'>
234 <div class='docs'>
235 <div class='octowrap'>
236 <a class='octothorpe' href='#section-13'>#</a>
237 </div>
238 <p>So this function starts by unpacking it into two lists, <code>headers</code> and <code>first_values</code>, which is
239&ldquo;thing 1&rdquo; and &ldquo;thing 2&rdquo; respectively.</p>
240 </div>
241 <div class='code'>
242 <div class="highlight"><pre> <span class="n">headers</span> <span class="o">=</span> <span class="n">row_data</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;values&#39;</span><span class="p">,</span> <span class="p">[])</span>
243 <span class="n">first_values</span> <span class="o">=</span> <span class="n">row_data</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;values&#39;</span><span class="p">,</span> <span class="p">[])</span></pre></div>
244 </div>
245 </div>
246 <div class='clearall'></div>
247 <div class='section' id='section-14'>
248 <div class='docs'>
249 <div class='octowrap'>
250 <a class='octothorpe' href='#section-14'>#</a>
251 </div>
252 <p>All of the objects in <code>headers</code> and <code>first_values</code> have the following shape:</p>
253 </div>
254 <div class='code'>
255 <div class="highlight"><pre></pre></div>
256 </div>
257 </div>
258 <div class='clearall'></div>
259 <div class='section' id='section-15'>
260 <div class='docs'>
261 <div class='octowrap'>
262 <a class='octothorpe' href='#section-15'>#</a>
263 </div>
264 <pre><code class="language-JSON">{
265 &quot;userEnteredValue&quot;: {&quot;stringValue&quot;: &quot;time1&quot;},
266 &quot;effectiveValue&quot;: {&quot;stringValue&quot;: &quot;time1&quot;},
267 &quot;formattedValue&quot;: &quot;time1&quot;,
268 &quot;userEnteredFormat&quot;: {...},
269 &quot;effectiveFormat&quot;: {}
270}
271</code></pre>
272 </div>
273 <div class='code'>
274 <div class="highlight"><pre></pre></div>
275 </div>
276 </div>
277 <div class='clearall'></div>
278 <div class='section' id='section-16'>
279 <div class='docs'>
280 <div class='octowrap'>
281 <a class='octothorpe' href='#section-16'>#</a>
282 </div>
283 <p>The base Sheet schema</p>
284 </div>
285 <div class='code'>
286 <div class="highlight"><pre> <span class="n">sheet_json_schema</span> <span class="o">=</span> <span class="p">{</span>
287 <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="s1">&#39;object&#39;</span><span class="p">,</span>
288 <span class="s1">&#39;additionalProperties&#39;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
289 <span class="s1">&#39;properties&#39;</span><span class="p">:</span> <span class="p">{</span>
290 <span class="s1">&#39;__sdc_spreadsheet_id&#39;</span><span class="p">:</span> <span class="p">{</span>
291 <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;null&#39;</span><span class="p">,</span> <span class="s1">&#39;string&#39;</span><span class="p">]</span>
292 <span class="p">},</span>
293 <span class="s1">&#39;__sdc_sheet_id&#39;</span><span class="p">:</span> <span class="p">{</span>
294 <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;null&#39;</span><span class="p">,</span> <span class="s1">&#39;integer&#39;</span><span class="p">]</span>
295 <span class="p">},</span>
296 <span class="s1">&#39;__sdc_row&#39;</span><span class="p">:</span> <span class="p">{</span>
297 <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;null&#39;</span><span class="p">,</span> <span class="s1">&#39;integer&#39;</span><span class="p">]</span>
298 <span class="p">}</span>
299 <span class="p">}</span>
300 <span class="p">}</span>
301
302 <span class="n">header_list</span> <span class="o">=</span> <span class="p">[]</span> <span class="c1"># used for checking uniqueness</span>
303 <span class="n">columns</span> <span class="o">=</span> <span class="p">[]</span>
304 <span class="n">prior_header</span> <span class="o">=</span> <span class="kc">None</span>
305 <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span>
306 <span class="n">skipped</span> <span class="o">=</span> <span class="mi">0</span></pre></div>
307 </div>
308 </div>
309 <div class='clearall'></div>
310 <div class='section' id='section-17'>
311 <div class='docs'>
312 <div class='octowrap'>
313 <a class='octothorpe' href='#section-17'>#</a>
314 </div>
315 <p>We loop over the columns in the <code>headers</code> list and accummulate an object in each return
316variable.</p>
317 </div>
318 <div class='code'>
319 <div class="highlight"><pre> <span class="k">for</span> <span class="n">header</span> <span class="ow">in</span> <span class="n">headers</span><span class="p">:</span>
320 <span class="n">column_index</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span>
321 <span class="n">column_letter</span> <span class="o">=</span> <span class="n">colnum_string</span><span class="p">(</span><span class="n">column_index</span><span class="p">)</span>
322 <span class="n">header_value</span> <span class="o">=</span> <span class="n">header</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;formattedValue&#39;</span><span class="p">)</span>
323 <span class="k">if</span> <span class="n">header_value</span><span class="p">:</span> <span class="c1"># NOT skipped</span></pre></div>
324 </div>
325 </div>
326 <div class='clearall'></div>
327 <div class='section' id='section-18'>
328 <div class='docs'>
329 <div class='octowrap'>
330 <a class='octothorpe' href='#section-18'>#</a>
331 </div>
332 <p>Assuming the column we are looking at does not get skipped, we have to figure out the
333schema.</p>
334 </div>
335 <div class='code'>
336 <div class="highlight"><pre> <span class="n">column_is_skipped</span> <span class="o">=</span> <span class="kc">False</span></pre></div>
337 </div>
338 </div>
339 <div class='clearall'></div>
340 <div class='section' id='section-19'>
341 <div class='docs'>
342 <div class='octowrap'>
343 <a class='octothorpe' href='#section-19'>#</a>
344 </div>
345 <p>First we reset the counter for consecutive skipped columns.</p>
346 </div>
347 <div class='code'>
348 <div class="highlight"><pre> <span class="n">skipped</span> <span class="o">=</span> <span class="mi">0</span></pre></div>
349 </div>
350 </div>
351 <div class='clearall'></div>
352 <div class='section' id='section-20'>
353 <div class='docs'>
354 <div class='octowrap'>
355 <a class='octothorpe' href='#section-20'>#</a>
356 </div>
357 <p>Then we let the name of this column be the value of <code>formattedValue</code> from the <code>header</code>
358object we are looking at. This seems to be the value rendered in Google Sheets in the
359cell.</p>
360 </div>
361 <div class='code'>
362 <div class="highlight"><pre> <span class="n">column_name</span> <span class="o">=</span> <span class="s1">&#39;</span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">header_value</span><span class="p">)</span></pre></div>
363 </div>
364 </div>
365 <div class='clearall'></div>
366 <div class='section' id='section-21'>
367 <div class='docs'>
368 <div class='octowrap'>
369 <a class='octothorpe' href='#section-21'>#</a>
370 </div>
371 <p>We assert that this column name is unique or else we raise a &ldquo;Duplicate Header Error&rdquo;.</p>
372 </div>
373 <div class='code'>
374 <div class="highlight"><pre> <span class="k">if</span> <span class="n">column_name</span> <span class="ow">in</span> <span class="n">header_list</span><span class="p">:</span>
375 <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;DUPLICATE HEADER ERROR: SHEET: </span><span class="si">{}</span><span class="s1">, COL: </span><span class="si">{}</span><span class="s1">, CELL: </span><span class="si">{}</span><span class="s1">1&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
376 <span class="n">sheet_title</span><span class="p">,</span> <span class="n">column_name</span><span class="p">,</span> <span class="n">column_letter</span><span class="p">))</span>
377 <span class="n">header_list</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">column_name</span><span class="p">)</span></pre></div>
378 </div>
379 </div>
380 <div class='clearall'></div>
381 <div class='section' id='section-22'>
382 <div class='docs'>
383 <div class='octowrap'>
384 <a class='octothorpe' href='#section-22'>#</a>
385 </div>
386 <p>We attempt to grab the value in the second row of the sheet (the first row of data)
387associated with this column. Remember this row we are looking at is stored in
388<code>first_values</code>. Note again that <code>headers</code> and <code>first_values</code> have the same shape.</p>
389 </div>
390 <div class='code'>
391 <div class="highlight"><pre> <span class="n">first_value</span> <span class="o">=</span> <span class="kc">None</span>
392 <span class="k">try</span><span class="p">:</span>
393 <span class="n">first_value</span> <span class="o">=</span> <span class="n">first_values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
394 <span class="k">except</span> <span class="ne">IndexError</span> <span class="k">as</span> <span class="n">err</span><span class="p">:</span>
395 <span class="n">LOGGER</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">&#39;NO VALUE IN 2ND ROW FOR HEADER. SHEET: </span><span class="si">{}</span><span class="s1">, COL: </span><span class="si">{}</span><span class="s1">, CELL: </span><span class="si">{}</span><span class="s1">2. </span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
396 <span class="n">sheet_title</span><span class="p">,</span> <span class="n">column_name</span><span class="p">,</span> <span class="n">column_letter</span><span class="p">,</span> <span class="n">err</span><span class="p">))</span>
397 <span class="n">first_value</span> <span class="o">=</span> <span class="p">{}</span>
398 <span class="n">first_values</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">first_value</span><span class="p">)</span>
399 <span class="k">pass</span>
400
401 <span class="n">column_effective_value</span> <span class="o">=</span> <span class="n">first_value</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;effectiveValue&#39;</span><span class="p">,</span> <span class="p">{})</span>
402
403 <span class="n">col_val</span> <span class="o">=</span> <span class="kc">None</span>
404 <span class="k">if</span> <span class="n">column_effective_value</span> <span class="o">==</span> <span class="p">{}:</span>
405 <span class="n">column_effective_value_type</span> <span class="o">=</span> <span class="s1">&#39;stringValue&#39;</span>
406 <span class="n">LOGGER</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">&#39;WARNING: NO VALUE IN 2ND ROW FOR HEADER. SHEET: </span><span class="si">{}</span><span class="s1">, COL: </span><span class="si">{}</span><span class="s1">, CELL: </span><span class="si">{}</span><span class="s1">2.&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
407 <span class="n">sheet_title</span><span class="p">,</span> <span class="n">column_name</span><span class="p">,</span> <span class="n">column_letter</span><span class="p">))</span>
408 <span class="n">LOGGER</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">&#39; Setting column datatype to STRING&#39;</span><span class="p">)</span>
409 <span class="k">else</span><span class="p">:</span></pre></div>
410 </div>
411 </div>
412 <div class='clearall'></div>
413 <div class='section' id='section-23'>
414 <div class='docs'>
415 <div class='octowrap'>
416 <a class='octothorpe' href='#section-23'>#</a>
417 </div>
418 <p>The tap calls the value of <code>"effectiveValue"</code> the <code>column_effective_value</code>. This
419dictionary can be empty or it can have a <code>key1</code> that looks like <code>"numberValue"</code>,
420<code>"stringValue"</code>, or <code>"boolValue"</code>. If the dictionary is empty, we force <code>key1</code> to
421be <code>"stringValue"</code>.</p>
422 </div>
423 <div class='code'>
424 <div class="highlight"><pre> <span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">val</span> <span class="ow">in</span> <span class="n">column_effective_value</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
425 <span class="k">if</span> <span class="n">key</span> <span class="ow">in</span> <span class="p">(</span><span class="s1">&#39;numberValue&#39;</span><span class="p">,</span> <span class="s1">&#39;stringValue&#39;</span><span class="p">,</span> <span class="s1">&#39;boolValue&#39;</span><span class="p">):</span>
426 <span class="n">column_effective_value_type</span> <span class="o">=</span> <span class="n">key</span>
427 <span class="n">col_val</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">val</span><span class="p">)</span></pre></div>
428 </div>
429 </div>
430 <div class='clearall'></div>
431 <div class='section' id='section-24'>
432 <div class='docs'>
433 <div class='octowrap'>
434 <a class='octothorpe' href='#section-24'>#</a>
435 </div>
436 <p>Sometimes <code>key1</code> also looks like <code>"errorType"</code> or <code>"formulaType"</code>, but in
437these cases, we raise a &ldquo;Data Type Error&rdquo; error immediately.</p>
438 </div>
439 <div class='code'>
440 <div class="highlight"><pre> <span class="k">elif</span> <span class="n">key</span> <span class="ow">in</span> <span class="p">(</span><span class="s1">&#39;errorType&#39;</span><span class="p">,</span> <span class="s1">&#39;formulaType&#39;</span><span class="p">):</span>
441 <span class="n">col_val</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">val</span><span class="p">)</span>
442 <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;DATA TYPE ERROR 2ND ROW VALUE: SHEET: </span><span class="si">{}</span><span class="s1">, COL: </span><span class="si">{}</span><span class="s1">, CELL: </span><span class="si">{}</span><span class="s1">2, TYPE: </span><span class="si">{}</span><span class="s1">, VALUE: </span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
443 <span class="n">sheet_title</span><span class="p">,</span> <span class="n">column_name</span><span class="p">,</span> <span class="n">column_letter</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">col_val</span><span class="p">))</span>
444
445 <span class="n">column_number_format</span> <span class="o">=</span> <span class="n">first_values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;effectiveFormat&#39;</span><span class="p">,</span> <span class="p">{})</span><span class="o">.</span><span class="n">get</span><span class="p">(</span>
446 <span class="s1">&#39;numberFormat&#39;</span><span class="p">,</span> <span class="p">{})</span></pre></div>
447 </div>
448 </div>
449 <div class='clearall'></div>
450 <div class='section' id='section-25'>
451 <div class='docs'>
452 <div class='octowrap'>
453 <a class='octothorpe' href='#section-25'>#</a>
454 </div>
455 <p>column_number_format_type = UNSPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY, DATE</p>
456<ul>
457<li>TIME, DATE_TIME, SCIENTIFIC</li>
458<li>https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#NumberFormatType</li>
459</ul>
460 </div>
461 <div class='code'>
462 <div class="highlight"><pre> <span class="n">column_number_format_type</span> <span class="o">=</span> <span class="n">column_number_format</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;type&#39;</span><span class="p">)</span></pre></div>
463 </div>
464 </div>
465 <div class='clearall'></div>
466 <div class='section' id='section-26'>
467 <div class='docs'>
468 <div class='octowrap'>
469 <a class='octothorpe' href='#section-26'>#</a>
470 </div>
471 <p>the giant if-elif-else block: All it does is set a variable <code>col_properties</code> and
472<code>column_gs_type</code> based on the values of <code>column_effective_value_type</code> and
473<code>column_number_format_type</code>.</p>
474 </div>
475 <div class='code'>
476 <div class="highlight"><pre> <span class="n">column_format</span> <span class="o">=</span> <span class="kc">None</span>
477 <span class="k">if</span> <span class="n">column_effective_value</span> <span class="o">==</span> <span class="p">{}:</span>
478 <span class="n">col_properties</span> <span class="o">=</span> <span class="p">{</span><span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;null&#39;</span><span class="p">,</span> <span class="s1">&#39;string&#39;</span><span class="p">]}</span>
479 <span class="n">column_gs_type</span> <span class="o">=</span> <span class="s1">&#39;stringValue&#39;</span>
480 <span class="n">LOGGER</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">&#39;WARNING: 2ND ROW VALUE IS BLANK: SHEET: </span><span class="si">{}</span><span class="s1">, COL: </span><span class="si">{}</span><span class="s1">, CELL: </span><span class="si">{}</span><span class="s1">2&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
481 <span class="n">sheet_title</span><span class="p">,</span> <span class="n">column_name</span><span class="p">,</span> <span class="n">column_letter</span><span class="p">))</span>
482 <span class="n">LOGGER</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">&#39; Setting column datatype to STRING&#39;</span><span class="p">)</span></pre></div>
483 </div>
484 </div>
485 <div class='clearall'></div>
486 <div class='section' id='section-27'>
487 <div class='docs'>
488 <div class='octowrap'>
489 <a class='octothorpe' href='#section-27'>#</a>
490 </div>
491 <p>column_effective_value_type = numberValue, stringValue, boolValue</p>
492<ul>
493<li>INVALID: errorType, formulaType</li>
494<li>https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/other#ExtendedValue</li>
495</ul>
496 </div>
497 <div class='code'>
498 <div class="highlight"><pre> <span class="k">elif</span> <span class="n">column_effective_value_type</span> <span class="o">==</span> <span class="s1">&#39;stringValue&#39;</span><span class="p">:</span>
499 <span class="n">col_properties</span> <span class="o">=</span> <span class="p">{</span><span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;null&#39;</span><span class="p">,</span> <span class="s1">&#39;string&#39;</span><span class="p">]}</span>
500 <span class="n">column_gs_type</span> <span class="o">=</span> <span class="s1">&#39;stringValue&#39;</span>
501 <span class="k">elif</span> <span class="n">column_effective_value_type</span> <span class="o">==</span> <span class="s1">&#39;boolValue&#39;</span><span class="p">:</span>
502 <span class="n">col_properties</span> <span class="o">=</span> <span class="p">{</span><span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;null&#39;</span><span class="p">,</span> <span class="s1">&#39;boolean&#39;</span><span class="p">,</span> <span class="s1">&#39;string&#39;</span><span class="p">]}</span>
503 <span class="n">column_gs_type</span> <span class="o">=</span> <span class="s1">&#39;boolValue&#39;</span>
504 <span class="k">elif</span> <span class="n">column_effective_value_type</span> <span class="o">==</span> <span class="s1">&#39;numberValue&#39;</span><span class="p">:</span>
505 <span class="k">if</span> <span class="n">column_number_format_type</span> <span class="o">==</span> <span class="s1">&#39;DATE_TIME&#39;</span><span class="p">:</span>
506 <span class="n">col_properties</span> <span class="o">=</span> <span class="p">{</span>
507 <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;null&#39;</span><span class="p">,</span> <span class="s1">&#39;string&#39;</span><span class="p">],</span>
508 <span class="s1">&#39;format&#39;</span><span class="p">:</span> <span class="s1">&#39;date-time&#39;</span>
509 <span class="p">}</span>
510 <span class="n">column_gs_type</span> <span class="o">=</span> <span class="s1">&#39;numberType.DATE_TIME&#39;</span>
511 <span class="k">elif</span> <span class="n">column_number_format_type</span> <span class="o">==</span> <span class="s1">&#39;DATE&#39;</span><span class="p">:</span>
512 <span class="n">col_properties</span> <span class="o">=</span> <span class="p">{</span>
513 <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;null&#39;</span><span class="p">,</span> <span class="s1">&#39;string&#39;</span><span class="p">],</span>
514 <span class="s1">&#39;format&#39;</span><span class="p">:</span> <span class="s1">&#39;date&#39;</span>
515 <span class="p">}</span>
516 <span class="n">column_gs_type</span> <span class="o">=</span> <span class="s1">&#39;numberType.DATE&#39;</span>
517 <span class="k">elif</span> <span class="n">column_number_format_type</span> <span class="o">==</span> <span class="s1">&#39;TIME&#39;</span><span class="p">:</span>
518 <span class="n">col_properties</span> <span class="o">=</span> <span class="p">{</span>
519 <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;null&#39;</span><span class="p">,</span> <span class="s1">&#39;string&#39;</span><span class="p">],</span>
520 <span class="s1">&#39;format&#39;</span><span class="p">:</span> <span class="s1">&#39;time&#39;</span>
521 <span class="p">}</span>
522 <span class="n">column_gs_type</span> <span class="o">=</span> <span class="s1">&#39;numberType.TIME&#39;</span>
523 <span class="k">elif</span> <span class="n">column_number_format_type</span> <span class="o">==</span> <span class="s1">&#39;TEXT&#39;</span><span class="p">:</span>
524 <span class="n">col_properties</span> <span class="o">=</span> <span class="p">{</span><span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;null&#39;</span><span class="p">,</span> <span class="s1">&#39;string&#39;</span><span class="p">]}</span>
525 <span class="n">column_gs_type</span> <span class="o">=</span> <span class="s1">&#39;stringValue&#39;</span>
526 <span class="k">else</span><span class="p">:</span>
527 <span class="n">col_properties</span> <span class="o">=</span> <span class="p">{</span><span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="s1">&#39;number&#39;</span><span class="p">,</span> <span class="s1">&#39;multipleOf&#39;</span><span class="p">:</span> <span class="mf">1e-15</span><span class="p">}</span>
528 <span class="n">column_gs_type</span> <span class="o">=</span> <span class="s1">&#39;numberType&#39;</span>
529 <span class="k">else</span><span class="p">:</span>
530 <span class="n">col_properties</span> <span class="o">=</span> <span class="p">{</span><span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;null&#39;</span><span class="p">,</span> <span class="s1">&#39;string&#39;</span><span class="p">]}</span>
531 <span class="n">column_gs_type</span> <span class="o">=</span> <span class="s1">&#39;unsupportedValue&#39;</span>
532 <span class="n">LOGGER</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">&#39;WARNING: UNSUPPORTED 2ND ROW VALUE: SHEET: </span><span class="si">{}</span><span class="s1">, COL: </span><span class="si">{}</span><span class="s1">, CELL: </span><span class="si">{}</span><span class="s1">2, TYPE: </span><span class="si">{}</span><span class="s1">, VALUE: </span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
533 <span class="n">sheet_title</span><span class="p">,</span> <span class="n">column_name</span><span class="p">,</span> <span class="n">column_letter</span><span class="p">,</span> <span class="n">column_effective_value_type</span><span class="p">,</span> <span class="n">col_val</span><span class="p">))</span>
534 <span class="n">LOGGER</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">&#39;Converting to string.&#39;</span><span class="p">)</span>
535 <span class="k">else</span><span class="p">:</span> <span class="c1"># skipped</span></pre></div>
536 </div>
537 </div>
538 <div class='clearall'></div>
539 <div class='section' id='section-28'>
540 <div class='docs'>
541 <div class='octowrap'>
542 <a class='octothorpe' href='#section-28'>#</a>
543 </div>
544 <p>We note that we are skipping this column. It still gets added to the schema though as
545a string field. The only other notable thing about skipped columns is the we create
546the field name for it, and it looks like <code>"__sdc_skip_col_XY"</code>, where the <code>XY</code> goes
547from <code>"00"</code>, <code>"01"</code>, to <code>"99"</code>.</p>
548 </div>
549 <div class='code'>
550 <div class="highlight"><pre> <span class="n">column_is_skipped</span> <span class="o">=</span> <span class="kc">True</span>
551 <span class="n">skipped</span> <span class="o">=</span> <span class="n">skipped</span> <span class="o">+</span> <span class="mi">1</span>
552 <span class="n">column_index_str</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">column_index</span><span class="p">)</span><span class="o">.</span><span class="n">zfill</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
553 <span class="n">column_name</span> <span class="o">=</span> <span class="s1">&#39;__sdc_skip_col_</span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">column_index_str</span><span class="p">)</span>
554 <span class="n">col_properties</span> <span class="o">=</span> <span class="p">{</span><span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;null&#39;</span><span class="p">,</span> <span class="s1">&#39;string&#39;</span><span class="p">]}</span>
555 <span class="n">column_gs_type</span> <span class="o">=</span> <span class="s1">&#39;stringValue&#39;</span>
556 <span class="n">LOGGER</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">&#39;WARNING: SKIPPED COLUMN; NO COLUMN HEADER. SHEET: </span><span class="si">{}</span><span class="s1">, COL: </span><span class="si">{}</span><span class="s1">, CELL: </span><span class="si">{}</span><span class="s1">1&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
557 <span class="n">sheet_title</span><span class="p">,</span> <span class="n">column_name</span><span class="p">,</span> <span class="n">column_letter</span><span class="p">))</span>
558 <span class="n">LOGGER</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">&#39; This column will be skipped during data loading.&#39;</span><span class="p">)</span>
559
560 <span class="k">if</span> <span class="n">skipped</span> <span class="o">&gt;=</span> <span class="mi">2</span><span class="p">:</span>
561 <span class="n">sheet_json_schema</span><span class="p">[</span><span class="s1">&#39;properties&#39;</span><span class="p">]</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="n">prior_header</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
562 <span class="n">LOGGER</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">&#39;TWO CONSECUTIVE SKIPPED COLUMNS. STOPPING SCAN AT: SHEET: </span><span class="si">{}</span><span class="s1">, COL: </span><span class="si">{}</span><span class="s1">, CELL </span><span class="si">{}</span><span class="s1">1&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
563 <span class="n">sheet_title</span><span class="p">,</span> <span class="n">column_name</span><span class="p">,</span> <span class="n">column_letter</span><span class="p">))</span>
564 <span class="k">break</span>
565
566 <span class="k">else</span><span class="p">:</span>
567 <span class="n">column</span> <span class="o">=</span> <span class="p">{}</span>
568 <span class="n">column</span> <span class="o">=</span> <span class="p">{</span>
569 <span class="s1">&#39;columnIndex&#39;</span><span class="p">:</span> <span class="n">column_index</span><span class="p">,</span>
570 <span class="s1">&#39;columnLetter&#39;</span><span class="p">:</span> <span class="n">column_letter</span><span class="p">,</span>
571 <span class="s1">&#39;columnName&#39;</span><span class="p">:</span> <span class="n">column_name</span><span class="p">,</span>
572 <span class="s1">&#39;columnType&#39;</span><span class="p">:</span> <span class="n">column_gs_type</span><span class="p">,</span>
573 <span class="s1">&#39;columnSkipped&#39;</span><span class="p">:</span> <span class="n">column_is_skipped</span>
574 <span class="p">}</span>
575 <span class="n">columns</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">column</span><span class="p">)</span>
576
577 <span class="k">if</span> <span class="n">column_gs_type</span> <span class="ow">in</span> <span class="p">{</span><span class="s1">&#39;numberType.DATE_TIME&#39;</span><span class="p">,</span> <span class="s1">&#39;numberType.DATE&#39;</span><span class="p">,</span> <span class="s1">&#39;numberType.TIME&#39;</span><span class="p">,</span> <span class="s1">&#39;numberType&#39;</span><span class="p">}:</span>
578 <span class="n">col_properties</span> <span class="o">=</span> <span class="p">{</span>
579 <span class="s1">&#39;anyOf&#39;</span><span class="p">:</span> <span class="p">[</span>
580 <span class="n">col_properties</span><span class="p">,</span>
581 <span class="p">{</span><span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;null&#39;</span><span class="p">,</span> <span class="s1">&#39;string&#39;</span><span class="p">]}</span>
582 <span class="p">]</span>
583 <span class="p">}</span>
584
585 <span class="n">sheet_json_schema</span><span class="p">[</span><span class="s1">&#39;properties&#39;</span><span class="p">][</span><span class="n">column_name</span><span class="p">]</span> <span class="o">=</span> <span class="n">col_properties</span>
586
587 <span class="n">prior_header</span> <span class="o">=</span> <span class="n">column_name</span>
588 <span class="n">i</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span>
589
590 <span class="k">return</span> <span class="n">sheet_json_schema</span><span class="p">,</span> <span class="n">columns</span></pre></div>
591 </div>
592 </div>
593 <div class='clearall'></div>
594 <div class='section' id='section-29'>
595 <div class='docs'>
596 <div class='octowrap'>
597 <a class='octothorpe' href='#section-29'>#</a>
598 </div>
599 <p>The point of this function seems to be (1) make a request to get a sheet (2) return the schema
600generated for this sheet by <code>schema.py:get_sheet_schema_columns</code>.</p>
601<p><code>get_sheet_metadata()</code> sets up a lot of variables to ultimately make a request to</p>
602<pre><code class="language-Text">https://sheets.googleapis.com/v4/spreadsheets/my-spreadsheet-id?includeGridData=true&amp;ranges='my-sheet-title'!1:2
603</code></pre>
604 </div>
605 <div class='code'>
606 <div class="highlight"><pre></pre></div>
607 </div>
608 </div>
609 <div class='clearall'></div>
610 <div class='section' id='section-30'>
611 <div class='docs'>
612 <div class='octowrap'>
613 <a class='octothorpe' href='#section-30'>#</a>
614 </div>
615 <p>Let&rsquo;s dissect the query params here a bit.</p>
616 </div>
617 <div class='code'>
618 <div class="highlight"><pre></pre></div>
619 </div>
620 </div>
621 <div class='clearall'></div>
622 <div class='section' id='section-31'>
623 <div class='docs'>
624 <div class='octowrap'>
625 <a class='octothorpe' href='#section-31'>#</a>
626 </div>
627 <p><code>includeGridData</code> is false by default and setting this to true lets us get &ldquo;Grid data&rdquo;. If you
628compare the same request but with that value flipped, then you&rsquo;ll notice the <code>includeGridData=false</code>
629gives you a relatively small response with no data in it. It seems like just a bunch of metadata.</p>
630 </div>
631 <div class='code'>
632 <div class="highlight"><pre></pre></div>
633 </div>
634 </div>
635 <div class='clearall'></div>
636 <div class='section' id='section-32'>
637 <div class='docs'>
638 <div class='octowrap'>
639 <a class='octothorpe' href='#section-32'>#</a>
640 </div>
641 <p><code>ranges</code> controls the rows returned.</p>
642 </div>
643 <div class='code'>
644 <div class="highlight"><pre><span class="k">def</span> <span class="nf">get_sheet_metadata</span><span class="p">(</span><span class="n">sheet</span><span class="p">,</span> <span class="n">spreadsheet_id</span><span class="p">,</span> <span class="n">client</span><span class="p">):</span></pre></div>
645 </div>
646 </div>
647 <div class='clearall'></div>
648 <div class='section' id='section-33'>
649 <div class='docs'>
650 <div class='octowrap'>
651 <a class='octothorpe' href='#section-33'>#</a>
652 </div>
653 <p>Get Header Row and 1st data row (Rows 1 &amp; 2) from a Sheet on Spreadsheet w/ sheet_metadata query</p>
654<ul>
655<li>endpoint: spreadsheets/{spreadsheet_id}</li>
656<li>params: includeGridData = true, ranges = &lsquo;{sheet_title}&rsquo;!1:2
657This endpoint includes detailed metadata about each cell - incl. data type, formatting, etc.</li>
658</ul>
659 </div>
660 <div class='code'>
661 <div class="highlight"><pre> <span class="n">sheet_id</span> <span class="o">=</span> <span class="n">sheet</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;properties&#39;</span><span class="p">,</span> <span class="p">{})</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;sheetId&#39;</span><span class="p">)</span>
662 <span class="n">sheet_title</span> <span class="o">=</span> <span class="n">sheet</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;properties&#39;</span><span class="p">,</span> <span class="p">{})</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;title&#39;</span><span class="p">)</span>
663 <span class="n">LOGGER</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">&#39;sheet_id = </span><span class="si">{}</span><span class="s1">, sheet_title = </span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">sheet_id</span><span class="p">,</span> <span class="n">sheet_title</span><span class="p">))</span>
664
665 <span class="n">stream_name</span> <span class="o">=</span> <span class="s1">&#39;sheet_metadata&#39;</span>
666 <span class="n">stream_metadata</span> <span class="o">=</span> <span class="n">STREAMS</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">stream_name</span><span class="p">)</span>
667 <span class="n">api</span> <span class="o">=</span> <span class="n">stream_metadata</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;api&#39;</span><span class="p">,</span> <span class="s1">&#39;sheets&#39;</span><span class="p">)</span>
668 <span class="n">params</span> <span class="o">=</span> <span class="n">stream_metadata</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;params&#39;</span><span class="p">,</span> <span class="p">{})</span>
669 <span class="n">sheet_title_encoded</span> <span class="o">=</span> <span class="n">urllib</span><span class="o">.</span><span class="n">parse</span><span class="o">.</span><span class="n">quote_plus</span><span class="p">(</span><span class="n">sheet_title</span><span class="p">)</span>
670 <span class="n">sheet_title_escaped</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">escape</span><span class="p">(</span><span class="n">sheet_title</span><span class="p">)</span>
671 <span class="n">querystring</span> <span class="o">=</span> <span class="s1">&#39;&amp;&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span>
672 <span class="p">[</span><span class="s1">&#39;</span><span class="si">%s</span><span class="s1">=</span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span> <span class="k">for</span> <span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span> <span class="ow">in</span> <span class="n">params</span><span class="o">.</span><span class="n">items</span><span class="p">()]</span>
673 <span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">&#39;</span><span class="si">{sheet_title}</span><span class="s1">&#39;</span><span class="p">,</span> <span class="n">sheet_title_encoded</span><span class="p">)</span>
674 <span class="n">path</span> <span class="o">=</span> <span class="s1">&#39;</span><span class="si">{}</span><span class="s1">?</span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
675 <span class="n">stream_metadata</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;path&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">&#39;</span><span class="si">{spreadsheet_id}</span><span class="s1">&#39;</span><span class="p">,</span> <span class="n">spreadsheet_id</span><span class="p">),</span>
676 <span class="n">querystring</span>
677 <span class="p">)</span></pre></div>
678 </div>
679 </div>
680 <div class='clearall'></div>
681 <div class='section' id='section-34'>
682 <div class='docs'>
683 <div class='octowrap'>
684 <a class='octothorpe' href='#section-34'>#</a>
685 </div>
686 <p>See the Footnotes for this response shape</p>
687 </div>
688 <div class='code'>
689 <div class="highlight"><pre> <span class="n">sheet_md_results</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">path</span><span class="o">=</span><span class="n">path</span><span class="p">,</span> <span class="n">api</span><span class="o">=</span><span class="n">api</span><span class="p">,</span> <span class="n">endpoint</span><span class="o">=</span><span class="n">sheet_title_escaped</span><span class="p">)</span>
690 <span class="n">sheet_metadata</span> <span class="o">=</span> <span class="n">sheet_md_results</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;sheets&#39;</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
691
692
693 <span class="k">try</span><span class="p">:</span></pre></div>
694 </div>
695 </div>
696 <div class='clearall'></div>
697 <div class='section' id='section-35'>
698 <div class='docs'>
699 <div class='octowrap'>
700 <a class='octothorpe' href='#section-35'>#</a>
701 </div>
702 <p>Create sheet_json_schema (for discovery/catalog) and columns (for sheet_metadata results)</p>
703 </div>
704 <div class='code'>
705 <div class="highlight"><pre> <span class="n">sheet_json_schema</span><span class="p">,</span> <span class="n">columns</span> <span class="o">=</span> <span class="n">get_sheet_schema_columns</span><span class="p">(</span><span class="n">sheet_metadata</span><span class="p">)</span>
706 <span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">err</span><span class="p">:</span>
707 <span class="n">LOGGER</span><span class="o">.</span><span class="n">warning</span><span class="p">(</span><span class="s1">&#39;</span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">err</span><span class="p">))</span>
708 <span class="n">LOGGER</span><span class="o">.</span><span class="n">warning</span><span class="p">(</span><span class="s1">&#39;SKIPPING Malformed sheet: </span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">sheet_title</span><span class="p">))</span>
709 <span class="n">sheet_json_schema</span><span class="p">,</span> <span class="n">columns</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
710
711 <span class="k">return</span> <span class="n">sheet_json_schema</span><span class="p">,</span> <span class="n">columns</span></pre></div>
712 </div>
713 </div>
714 <div class='clearall'></div>
715 <div class='section' id='section-36'>
716 <div class='docs'>
717 <div class='octowrap'>
718 <a class='octothorpe' href='#section-36'>#</a>
719 </div>
720
721 </div>
722 <div class='code'>
723 <div class="highlight"><pre><span class="k">def</span> <span class="nf">get_abs_path</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
724 <span class="k">return</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">realpath</span><span class="p">(</span><span class="vm">__file__</span><span class="p">)),</span> <span class="n">path</span><span class="p">)</span></pre></div>
725 </div>
726 </div>
727 <div class='clearall'></div>
728 <div class='section' id='section-37'>
729 <div class='docs'>
730 <div class='octowrap'>
731 <a class='octothorpe' href='#section-37'>#</a>
732 </div>
733 <p>We initialize our return variables, <code>schemas</code> and <code>field_metadata</code> to empty dictionaries.</p>
734<p>We loop over each stream in <code>streams.py:STREAMS</code>. We load the static JSON file into memory - all
735four streams currently have some static schema. We store this on our return variable <code>schemas</code>
736under the stream name.</p>
737<p>We then call <code>singer.metadata.get_standard_metadata()</code> passing in whatever metadata we do have
738(key properties, valid replication keys, the replication method). The return value here is
739stored on our return variable <code>field_metadata</code> under the stream name.</p>
740 </div>
741 <div class='code'>
742 <div class="highlight"><pre><span class="k">def</span> <span class="nf">get_schemas</span><span class="p">(</span><span class="n">client</span><span class="p">,</span> <span class="n">spreadsheet_id</span><span class="p">):</span></pre></div>
743 </div>
744 </div>
745 <div class='clearall'></div>
746 <div class='section' id='section-38'>
747 <div class='docs'>
748 <div class='octowrap'>
749 <a class='octothorpe' href='#section-38'>#</a>
750 </div>
751
752 </div>
753 <div class='code'>
754 <div class="highlight"><pre> <span class="n">schemas</span> <span class="o">=</span> <span class="p">{}</span>
755 <span class="n">field_metadata</span> <span class="o">=</span> <span class="p">{}</span>
756
757 <span class="k">for</span> <span class="n">stream_name</span><span class="p">,</span> <span class="n">stream_metadata</span> <span class="ow">in</span> <span class="n">STREAMS</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
758 <span class="n">schema_path</span> <span class="o">=</span> <span class="n">get_abs_path</span><span class="p">(</span><span class="s1">&#39;schemas/</span><span class="si">{}</span><span class="s1">.json&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">stream_name</span><span class="p">))</span>
759 <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">schema_path</span><span class="p">)</span> <span class="k">as</span> <span class="n">file</span><span class="p">:</span>
760 <span class="n">schema</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">file</span><span class="p">)</span>
761 <span class="n">schemas</span><span class="p">[</span><span class="n">stream_name</span><span class="p">]</span> <span class="o">=</span> <span class="n">schema</span>
762 <span class="n">mdata</span> <span class="o">=</span> <span class="n">metadata</span><span class="o">.</span><span class="n">new</span><span class="p">()</span>
763
764 <span class="n">mdata</span> <span class="o">=</span> <span class="n">metadata</span><span class="o">.</span><span class="n">get_standard_metadata</span><span class="p">(</span>
765 <span class="n">schema</span><span class="o">=</span><span class="n">schema</span><span class="p">,</span>
766 <span class="n">key_properties</span><span class="o">=</span><span class="n">stream_metadata</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;key_properties&#39;</span><span class="p">,</span> <span class="kc">None</span><span class="p">),</span>
767 <span class="n">valid_replication_keys</span><span class="o">=</span><span class="n">stream_metadata</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;replication_keys&#39;</span><span class="p">,</span> <span class="kc">None</span><span class="p">),</span>
768 <span class="n">replication_method</span><span class="o">=</span><span class="n">stream_metadata</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;replication_method&#39;</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
769 <span class="p">)</span>
770 <span class="n">field_metadata</span><span class="p">[</span><span class="n">stream_name</span><span class="p">]</span> <span class="o">=</span> <span class="n">mdata</span></pre></div>
771 </div>
772 </div>
773 <div class='clearall'></div>
774 <div class='section' id='section-39'>
775 <div class='docs'>
776 <div class='octowrap'>
777 <a class='octothorpe' href='#section-39'>#</a>
778 </div>
779 <p>If we are handling the <code>"spreadsheet_metadata"</code> stream, we do some extra work to build the
780dynamic schemas of each Sheet we want to sync.. Otherwise, that&rsquo;s it.</p>
781 </div>
782 <div class='code'>
783 <div class="highlight"><pre> <span class="k">if</span> <span class="n">stream_name</span> <span class="o">==</span> <span class="s1">&#39;spreadsheet_metadata&#39;</span><span class="p">:</span></pre></div>
784 </div>
785 </div>
786 <div class='clearall'></div>
787 <div class='section' id='section-40'>
788 <div class='docs'>
789 <div class='octowrap'>
790 <a class='octothorpe' href='#section-40'>#</a>
791 </div>
792 <p>We ultimately end up making a <code>GET</code> to</p>
793 </div>
794 <div class='code'>
795 <div class="highlight"><pre></pre></div>
796 </div>
797 </div>
798 <div class='clearall'></div>
799 <div class='section' id='section-41'>
800 <div class='docs'>
801 <div class='octowrap'>
802 <a class='octothorpe' href='#section-41'>#</a>
803 </div>
804 <pre><code class="language-Text">https://sheets.googleapis.com/v4/spreadsheets/my-spreadsheet-id?includeGridData=false
805</code></pre>
806 </div>
807 <div class='code'>
808 <div class="highlight"><pre></pre></div>
809 </div>
810 </div>
811 <div class='clearall'></div>
812 <div class='section' id='section-42'>
813 <div class='docs'>
814 <div class='octowrap'>
815 <a class='octothorpe' href='#section-42'>#</a>
816 </div>
817 <p>Notice this is <code>base_url + path + query_string</code>. There&rsquo;s code here to figure out and
818properly format <code>path</code> and <code>query_string</code>. I&rsquo;m not sure why we don&rsquo;t let <code>requests</code>
819handle this.</p>
820 </div>
821 <div class='code'>
822 <div class="highlight"><pre></pre></div>
823 </div>
824 </div>
825 <div class='clearall'></div>
826 <div class='section' id='section-43'>
827 <div class='docs'>
828 <div class='octowrap'>
829 <a class='octothorpe' href='#section-43'>#</a>
830 </div>
831 <p>We assume this request is successful and we store the <code>OrderedDict</code> return value as
832<code>spreadsheet_md_results</code>.</p>
833 </div>
834 <div class='code'>
835 <div class="highlight"><pre> <span class="n">api</span> <span class="o">=</span> <span class="n">stream_metadata</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;api&#39;</span><span class="p">,</span> <span class="s1">&#39;sheets&#39;</span><span class="p">)</span>
836 <span class="n">params</span> <span class="o">=</span> <span class="n">stream_metadata</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;params&#39;</span><span class="p">,</span> <span class="p">{})</span>
837 <span class="n">querystring</span> <span class="o">=</span> <span class="s1">&#39;&amp;&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="s1">&#39;</span><span class="si">%s</span><span class="s1">=</span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span> <span class="k">for</span> <span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span> <span class="ow">in</span> <span class="n">params</span><span class="o">.</span><span class="n">items</span><span class="p">()])</span>
838 <span class="n">path</span> <span class="o">=</span> <span class="s1">&#39;</span><span class="si">{}</span><span class="s1">?</span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
839 <span class="n">stream_metadata</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;path&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">&#39;</span><span class="si">{spreadsheet_id}</span><span class="s1">&#39;</span><span class="p">,</span> <span class="n">spreadsheet_id</span><span class="p">),</span>
840 <span class="n">querystring</span>
841 <span class="p">)</span>
842
843 <span class="n">spreadsheet_md_results</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">get</span><span class="p">(</span>
844 <span class="n">path</span><span class="o">=</span><span class="n">path</span><span class="p">,</span>
845 <span class="n">params</span><span class="o">=</span><span class="n">querystring</span><span class="p">,</span>
846 <span class="n">api</span><span class="o">=</span><span class="n">api</span><span class="p">,</span>
847 <span class="n">endpoint</span><span class="o">=</span><span class="n">stream_name</span>
848 <span class="p">)</span></pre></div>
849 </div>
850 </div>
851 <div class='clearall'></div>
852 <div class='section' id='section-44'>
853 <div class='docs'>
854 <div class='octowrap'>
855 <a class='octothorpe' href='#section-44'>#</a>
856 </div>
857 <p>The response here is one of those &ldquo;envelope&rdquo; kinds. The data we care about is under
858the <code>"sheets"</code> key.</p>
859 </div>
860 <div class='code'>
861 <div class="highlight"><pre> <span class="n">sheets</span> <span class="o">=</span> <span class="n">spreadsheet_md_results</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;sheets&#39;</span><span class="p">)</span>
862 <span class="k">if</span> <span class="n">sheets</span><span class="p">:</span></pre></div>
863 </div>
864 </div>
865 <div class='clearall'></div>
866 <div class='section' id='section-45'>
867 <div class='docs'>
868 <div class='octowrap'>
869 <a class='octothorpe' href='#section-45'>#</a>
870 </div>
871 <p>Looping over this array, we call <code>schema.py:get_sheet_metadata</code>. This gets the
872JSON schema of each sheet found in this Google Doc. We use the sheet&rsquo;s title as
873the stream name here.</p>
874 </div>
875 <div class='code'>
876 <div class="highlight"><pre> <span class="k">for</span> <span class="n">sheet</span> <span class="ow">in</span> <span class="n">sheets</span><span class="p">:</span>
877 <span class="n">sheet_json_schema</span><span class="p">,</span> <span class="n">columns</span> <span class="o">=</span> <span class="n">get_sheet_metadata</span><span class="p">(</span><span class="n">sheet</span><span class="p">,</span> <span class="n">spreadsheet_id</span><span class="p">,</span> <span class="n">client</span><span class="p">)</span>
878
879 <span class="k">if</span> <span class="n">sheet_json_schema</span> <span class="ow">and</span> <span class="n">columns</span><span class="p">:</span>
880 <span class="n">sheet_title</span> <span class="o">=</span> <span class="n">sheet</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;properties&#39;</span><span class="p">,</span> <span class="p">{})</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;title&#39;</span><span class="p">)</span>
881 <span class="n">schemas</span><span class="p">[</span><span class="n">sheet_title</span><span class="p">]</span> <span class="o">=</span> <span class="n">sheet_json_schema</span>
882 <span class="n">sheet_mdata</span> <span class="o">=</span> <span class="n">metadata</span><span class="o">.</span><span class="n">new</span><span class="p">()</span>
883 <span class="n">sheet_mdata</span> <span class="o">=</span> <span class="n">metadata</span><span class="o">.</span><span class="n">get_standard_metadata</span><span class="p">(</span>
884 <span class="n">schema</span><span class="o">=</span><span class="n">sheet_json_schema</span><span class="p">,</span>
885 <span class="n">key_properties</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;__sdc_row&#39;</span><span class="p">],</span>
886 <span class="n">valid_replication_keys</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
887 <span class="n">replication_method</span><span class="o">=</span><span class="s1">&#39;FULL_TABLE&#39;</span>
888 <span class="p">)</span>
889 <span class="n">field_metadata</span><span class="p">[</span><span class="n">sheet_title</span><span class="p">]</span> <span class="o">=</span> <span class="n">sheet_mdata</span>
890
891 <span class="k">return</span> <span class="n">schemas</span><span class="p">,</span> <span class="n">field_metadata</span></pre></div>
892 </div>
893 </div>
894 <div class='clearall'></div>
895 <div class='section' id='section-46'>
896 <div class='docs'>
897 <div class='octowrap'>
898 <a class='octothorpe' href='#section-46'>#</a>
899 </div>
900 <h1>Footnotes</h1>
901<p>The shape of response is like, but note the tap stores this in the recursive <code>OrderedDict</code> structure</p>
902 </div>
903 <div class='code'>
904 <div class="highlight"><pre></pre></div>
905 </div>
906 </div>
907 <div class='clearall'></div>
908 <div class='section' id='section-47'>
909 <div class='docs'>
910 <div class='octowrap'>
911 <a class='octothorpe' href='#section-47'>#</a>
912 </div>
913 <pre><code class="language-JSON">{
914 &quot;spreadsheetid&quot;: &quot;my-id&quot;,
915 &quot;properties&quot;: {...},
916 &quot;sheets&quot;: [
917 {
918 &quot;properties&quot;: {},
919 &quot;data&quot;: [
920 {
921 &quot;rowData&quot;: [
922 {
923 &quot;values&quot;: [
924 {
925 &quot;userEnteredValue&quot;: {&quot;stringValue&quot;: &quot;time1&quot;},
926 &quot;effectiveValue&quot;: {&quot;stringValue&quot;: &quot;time1&quot;},
927 &quot;formattedValue&quot;: &quot;time1&quot;,
928 &quot;userEnteredFormat&quot;: {...},
929 &quot;effectiveFormat&quot;: {}
930 },
931 ...
932 ],
933 },
934 ...
935 ]
936 }
937 ]
938 },
939 ]
940}
941</code></pre>
942 </div>
943 <div class='code'>
944 <div class="highlight"><pre></pre></div>
945 </div>
946 </div>
947 <div class='clearall'></div>
948</div>
949</body>