aboutsummaryrefslogtreecommitdiffhomepage
path: root/docs/streams.html
diff options
context:
space:
mode:
Diffstat (limited to 'docs/streams.html')
-rw-r--r--docs/streams.html185
1 files changed, 185 insertions, 0 deletions
diff --git a/docs/streams.html b/docs/streams.html
new file mode 100644
index 0000000..8c9b6d9
--- /dev/null
+++ b/docs/streams.html
@@ -0,0 +1,185 @@
1<!DOCTYPE html>
2<html>
3<head>
4 <meta http-equiv="content-type" content="text/html;charset=utf-8">
5 <title>streams.py</title>
6 <link rel="stylesheet" href="pycco.css">
7</head>
8<body>
9<div id='container'>
10 <div id="background"></div>
11 <div class='section'>
12 <div class='docs'><h1>streams.py</h1></div>
13 </div>
14 <div class='clearall'>
15 <div class='section' id='section-0'>
16 <div class='docs'>
17 <div class='octowrap'>
18 <a class='octothorpe' href='#section-0'>#</a>
19 </div>
20 <p><code>streams.py:STREAMS</code> is an <code>OrderedDict</code>. Only because we want to loop over it in the same order
21every time.</p>
22<p>It&rsquo;s still the same global variable found in taps of this style. It maps stream names to a
23dictionary describing the stream.</p>
24<p>Some notable things we learn in this file:</p>
25<ul>
26<li>
27<p><code>api</code> is either <code>"files"</code> or <code>"sheets"</code></p>
28</li>
29<li>
30<p>We saw this used in <code>client.py:GoogleClient.request()</code> to switch the base url of the request</p>
31</li>
32<li>
33<p><code>"file_metadata"</code> is the only incremental stream</p>
34</li>
35<li>
36<p>Full table streams include:</p>
37</li>
38<li><code>"spreadsheet_metadata"</code></li>
39<li><code>"sheet_metadata"</code></li>
40<li>
41<p><code>"sheets_loaded"</code></p>
42</li>
43<li>
44<p><code>"sheets_loaded"</code> is the only stream with a <code>"data_key"</code></p>
45</li>
46<li>We typically see <code>data_key</code> be the name of the key to get data out of &ldquo;envelope&rdquo; responses</li>
47</ul>
48 </div>
49 <div class='code'>
50 <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">OrderedDict</span></pre></div>
51 </div>
52 </div>
53 <div class='clearall'></div>
54 <div class='section' id='section-1'>
55 <div class='docs'>
56 <div class='octowrap'>
57 <a class='octothorpe' href='#section-1'>#</a>
58 </div>
59 <p>streams: API URL endpoints to be called
60properties:
61 <root node>: Plural stream name for the endpoint
62 path: API endpoint relative path, when added to the base URL, creates the full path,
63 default = stream_name
64 key_properties: Primary key fields for identifying an endpoint record.
65 replication_method: INCREMENTAL or FULL_TABLE
66 replication_keys: bookmark_field(s), typically a date-time, used for filtering the results
67 and setting the state
68 params: Query, sort, and other endpoint specific parameters; default = {}
69 data_key: JSON element containing the results list for the endpoint;
70 default = root (no data_key)</p>
71 </div>
72 <div class='code'>
73 <div class="highlight"><pre></pre></div>
74 </div>
75 </div>
76 <div class='clearall'></div>
77 <div class='section' id='section-2'>
78 <div class='docs'>
79 <div class='octowrap'>
80 <a class='octothorpe' href='#section-2'>#</a>
81 </div>
82 <p>file_metadata: Queries Google Drive API to get file information and see if file has been modified
83 Provides audit info about who and when last changed the file.</p>
84 </div>
85 <div class='code'>
86 <div class="highlight"><pre><span class="n">FILE_METADATA</span> <span class="o">=</span> <span class="p">{</span>
87 <span class="s2">&quot;api&quot;</span><span class="p">:</span> <span class="s2">&quot;files&quot;</span><span class="p">,</span>
88 <span class="s2">&quot;path&quot;</span><span class="p">:</span> <span class="s2">&quot;files/</span><span class="si">{spreadsheet_id}</span><span class="s2">&quot;</span><span class="p">,</span>
89 <span class="s2">&quot;key_properties&quot;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;id&quot;</span><span class="p">],</span>
90 <span class="s2">&quot;replication_method&quot;</span><span class="p">:</span> <span class="s2">&quot;INCREMENTAL&quot;</span><span class="p">,</span>
91 <span class="s2">&quot;replication_keys&quot;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;modifiedTime&quot;</span><span class="p">],</span>
92 <span class="s2">&quot;params&quot;</span><span class="p">:</span> <span class="p">{</span>
93 <span class="s2">&quot;fields&quot;</span><span class="p">:</span> <span class="s2">&quot;id,name,createdTime,modifiedTime,version,teamDriveId,driveId,lastModifyingUser&quot;</span>
94 <span class="p">}</span>
95<span class="p">}</span></pre></div>
96 </div>
97 </div>
98 <div class='clearall'></div>
99 <div class='section' id='section-3'>
100 <div class='docs'>
101 <div class='octowrap'>
102 <a class='octothorpe' href='#section-3'>#</a>
103 </div>
104 <p>spreadsheet_metadata: Queries spreadsheet to get basic information on spreadhsheet and sheets</p>
105 </div>
106 <div class='code'>
107 <div class="highlight"><pre><span class="n">SPREADSHEET_METADATA</span> <span class="o">=</span> <span class="p">{</span>
108 <span class="s2">&quot;api&quot;</span><span class="p">:</span> <span class="s2">&quot;sheets&quot;</span><span class="p">,</span>
109 <span class="s2">&quot;path&quot;</span><span class="p">:</span> <span class="s2">&quot;spreadsheets/</span><span class="si">{spreadsheet_id}</span><span class="s2">&quot;</span><span class="p">,</span>
110 <span class="s2">&quot;key_properties&quot;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;spreadsheetId&quot;</span><span class="p">],</span>
111 <span class="s2">&quot;replication_method&quot;</span><span class="p">:</span> <span class="s2">&quot;FULL_TABLE&quot;</span><span class="p">,</span>
112 <span class="s2">&quot;params&quot;</span><span class="p">:</span> <span class="p">{</span>
113 <span class="s2">&quot;includeGridData&quot;</span><span class="p">:</span> <span class="s2">&quot;false&quot;</span>
114 <span class="p">}</span>
115<span class="p">}</span></pre></div>
116 </div>
117 </div>
118 <div class='clearall'></div>
119 <div class='section' id='section-4'>
120 <div class='docs'>
121 <div class='octowrap'>
122 <a class='octothorpe' href='#section-4'>#</a>
123 </div>
124 <p>sheet_metadata: Get Header Row and 1st data row (Rows 1 &amp; 2) from a Sheet on Spreadsheet.
125This endpoint includes detailed metadata about each cell in the header and first data row
126 incl. data type, formatting, etc.</p>
127 </div>
128 <div class='code'>
129 <div class="highlight"><pre><span class="n">SHEET_METADATA</span> <span class="o">=</span> <span class="p">{</span>
130 <span class="s2">&quot;api&quot;</span><span class="p">:</span> <span class="s2">&quot;sheets&quot;</span><span class="p">,</span>
131 <span class="s2">&quot;path&quot;</span><span class="p">:</span> <span class="s2">&quot;spreadsheets/</span><span class="si">{spreadsheet_id}</span><span class="s2">&quot;</span><span class="p">,</span>
132 <span class="s2">&quot;key_properties&quot;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;sheetId&quot;</span><span class="p">],</span>
133 <span class="s2">&quot;replication_method&quot;</span><span class="p">:</span> <span class="s2">&quot;FULL_TABLE&quot;</span><span class="p">,</span>
134 <span class="s2">&quot;params&quot;</span><span class="p">:</span> <span class="p">{</span>
135 <span class="s2">&quot;includeGridData&quot;</span><span class="p">:</span> <span class="s2">&quot;true&quot;</span><span class="p">,</span>
136 <span class="s2">&quot;ranges&quot;</span><span class="p">:</span> <span class="s2">&quot;&#39;</span><span class="si">{sheet_title}</span><span class="s2">&#39;!1:2&quot;</span>
137 <span class="p">}</span>
138<span class="p">}</span></pre></div>
139 </div>
140 </div>
141 <div class='clearall'></div>
142 <div class='section' id='section-5'>
143 <div class='docs'>
144 <div class='octowrap'>
145 <a class='octothorpe' href='#section-5'>#</a>
146 </div>
147 <p>sheets_loaded: Queries a batch of Rows for each Sheet in the Spreadsheet.
148Each query uses the <code>values</code> endpoint, to get data-only, w/out the formatting/type metadata.</p>
149 </div>
150 <div class='code'>
151 <div class="highlight"><pre><span class="n">SHEETS_LOADED</span> <span class="o">=</span> <span class="p">{</span>
152 <span class="s2">&quot;api&quot;</span><span class="p">:</span> <span class="s2">&quot;sheets&quot;</span><span class="p">,</span>
153 <span class="s2">&quot;path&quot;</span><span class="p">:</span> <span class="s2">&quot;spreadsheets/</span><span class="si">{spreadsheet_id}</span><span class="s2">/values/&#39;</span><span class="si">{sheet_title}</span><span class="s2">&#39;!</span><span class="si">{range_rows}</span><span class="s2">&quot;</span><span class="p">,</span>
154 <span class="s2">&quot;data_key&quot;</span><span class="p">:</span> <span class="s2">&quot;values&quot;</span><span class="p">,</span>
155 <span class="s2">&quot;key_properties&quot;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;spreadsheetId&quot;</span><span class="p">,</span> <span class="s2">&quot;sheetId&quot;</span><span class="p">,</span> <span class="s2">&quot;loadDate&quot;</span><span class="p">],</span>
156 <span class="s2">&quot;replication_method&quot;</span><span class="p">:</span> <span class="s2">&quot;FULL_TABLE&quot;</span><span class="p">,</span>
157 <span class="s2">&quot;params&quot;</span><span class="p">:</span> <span class="p">{</span>
158 <span class="s2">&quot;dateTimeRenderOption&quot;</span><span class="p">:</span> <span class="s2">&quot;SERIAL_NUMBER&quot;</span><span class="p">,</span>
159 <span class="s2">&quot;valueRenderOption&quot;</span><span class="p">:</span> <span class="s2">&quot;UNFORMATTED_VALUE&quot;</span><span class="p">,</span>
160 <span class="s2">&quot;majorDimension&quot;</span><span class="p">:</span> <span class="s2">&quot;ROWS&quot;</span>
161 <span class="p">}</span>
162<span class="p">}</span></pre></div>
163 </div>
164 </div>
165 <div class='clearall'></div>
166 <div class='section' id='section-6'>
167 <div class='docs'>
168 <div class='octowrap'>
169 <a class='octothorpe' href='#section-6'>#</a>
170 </div>
171 <p>Ensure streams are ordered sequentially, logically.</p>
172 </div>
173 <div class='code'>
174 <div class="highlight"><pre><span class="n">STREAMS</span> <span class="o">=</span> <span class="n">OrderedDict</span><span class="p">()</span>
175<span class="n">STREAMS</span><span class="p">[</span><span class="s1">&#39;file_metadata&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">FILE_METADATA</span>
176<span class="n">STREAMS</span><span class="p">[</span><span class="s1">&#39;spreadsheet_metadata&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">SPREADSHEET_METADATA</span>
177<span class="n">STREAMS</span><span class="p">[</span><span class="s1">&#39;sheet_metadata&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">SHEET_METADATA</span>
178<span class="n">STREAMS</span><span class="p">[</span><span class="s1">&#39;sheets_loaded&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">SHEETS_LOADED</span>
179
180</pre></div>
181 </div>
182 </div>
183 <div class='clearall'></div>
184</div>
185</body>