<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>streams.py</title>
<link rel="stylesheet" href="pycco.css">
</head>
<body>
<div id='container'>
<div id="background"></div>
<div class='section'>
<div class='docs'><h1>streams.py</h1></div>
</div>
<div class='clearall'>
<div class='section' id='section-0'>
<div class='docs'>
<div class='octowrap'>
<a class='octothorpe' href='#section-0'>#</a>
</div>
<p><code>streams.py:STREAMS</code> is an <code>OrderedDict</code>. Only because we want to loop over it in the same order
every time.</p>
<p>It’s still the same global variable found in taps of this style. It maps stream names to a
dictionary describing the stream.</p>
<p>Some notable things we learn in this file:</p>
<ul>
<li>
<p><code>api</code> is either <code>"files"</code> or <code>"sheets"</code></p>
</li>
<li>
<p>We saw this used in <code>client.py:GoogleClient.request()</code> to switch the base url of the request</p>
</li>
<li>
<p><code>"file_metadata"</code> is the only incremental stream</p>
</li>
<li>
<p>Full table streams include:</p>
</li>
<li><code>"spreadsheet_metadata"</code></li>
<li><code>"sheet_metadata"</code></li>
<li>
<p><code>"sheets_loaded"</code></p>
</li>
<li>
<p><code>"sheets_loaded"</code> is the only stream with a <code>"data_key"</code></p>
</li>
<li>We typically see <code>data_key</code> be the name of the key to get data out of “envelope” responses</li>
</ul>
</div>
<div class='code'>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">OrderedDict</span></pre></div>
</div>
</div>
<div class='clearall'></div>
<div class='section' id='section-1'>
<div class='docs'>
<div class='octowrap'>
<a class='octothorpe' href='#section-1'>#</a>
</div>
<p>streams: API URL endpoints to be called
properties:
<root node>: Plural stream name for the endpoint
path: API endpoint relative path, when added to the base URL, creates the full path,
default = stream_name
key_properties: Primary key fields for identifying an endpoint record.
replication_method: INCREMENTAL or FULL_TABLE
replication_keys: bookmark_field(s), typically a date-time, used for filtering the results
and setting the state
params: Query, sort, and other endpoint specific parameters; default = {}
data_key: JSON element containing the results list for the endpoint;
default = root (no data_key)</p>
</div>
<div class='code'>
<div class="highlight"><pre></pre></div>
</div>
</div>
<div class='clearall'></div>
<div class='section' id='section-2'>
<div class='docs'>
<div class='octowrap'>
<a class='octothorpe' href='#section-2'>#</a>
</div>
<p>file_metadata: Queries Google Drive API to get file information and see if file has been modified
Provides audit info about who and when last changed the file.</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="n">FILE_METADATA</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"api"</span><span class="p">:</span> <span class="s2">"files"</span><span class="p">,</span>
<span class="s2">"path"</span><span class="p">:</span> <span class="s2">"files/</span><span class="si">{spreadsheet_id}</span><span class="s2">"</span><span class="p">,</span>
<span class="s2">"key_properties"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"id"</span><span class="p">],</span>
<span class="s2">"replication_method"</span><span class="p">:</span> <span class="s2">"INCREMENTAL"</span><span class="p">,</span>
<span class="s2">"replication_keys"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"modifiedTime"</span><span class="p">],</span>
<span class="s2">"params"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"fields"</span><span class="p">:</span> <span class="s2">"id,name,createdTime,modifiedTime,version,teamDriveId,driveId,lastModifyingUser"</span>
<span class="p">}</span>
<span class="p">}</span></pre></div>
</div>
</div>
<div class='clearall'></div>
<div class='section' id='section-3'>
<div class='docs'>
<div class='octowrap'>
<a class='octothorpe' href='#section-3'>#</a>
</div>
<p>spreadsheet_metadata: Queries spreadsheet to get basic information on spreadhsheet and sheets</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="n">SPREADSHEET_METADATA</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"api"</span><span class="p">:</span> <span class="s2">"sheets"</span><span class="p">,</span>
<span class="s2">"path"</span><span class="p">:</span> <span class="s2">"spreadsheets/</span><span class="si">{spreadsheet_id}</span><span class="s2">"</span><span class="p">,</span>
<span class="s2">"key_properties"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"spreadsheetId"</span><span class="p">],</span>
<span class="s2">"replication_method"</span><span class="p">:</span> <span class="s2">"FULL_TABLE"</span><span class="p">,</span>
<span class="s2">"params"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"includeGridData"</span><span class="p">:</span> <span class="s2">"false"</span>
<span class="p">}</span>
<span class="p">}</span></pre></div>
</div>
</div>
<div class='clearall'></div>
<div class='section' id='section-4'>
<div class='docs'>
<div class='octowrap'>
<a class='octothorpe' href='#section-4'>#</a>
</div>
<p>sheet_metadata: Get Header Row and 1st data row (Rows 1 & 2) from a Sheet on Spreadsheet.
This endpoint includes detailed metadata about each cell in the header and first data row
incl. data type, formatting, etc.</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="n">SHEET_METADATA</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"api"</span><span class="p">:</span> <span class="s2">"sheets"</span><span class="p">,</span>
<span class="s2">"path"</span><span class="p">:</span> <span class="s2">"spreadsheets/</span><span class="si">{spreadsheet_id}</span><span class="s2">"</span><span class="p">,</span>
<span class="s2">"key_properties"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"sheetId"</span><span class="p">],</span>
<span class="s2">"replication_method"</span><span class="p">:</span> <span class="s2">"FULL_TABLE"</span><span class="p">,</span>
<span class="s2">"params"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"includeGridData"</span><span class="p">:</span> <span class="s2">"true"</span><span class="p">,</span>
<span class="s2">"ranges"</span><span class="p">:</span> <span class="s2">"'</span><span class="si">{sheet_title}</span><span class="s2">'!1:2"</span>
<span class="p">}</span>
<span class="p">}</span></pre></div>
</div>
</div>
<div class='clearall'></div>
<div class='section' id='section-5'>
<div class='docs'>
<div class='octowrap'>
<a class='octothorpe' href='#section-5'>#</a>
</div>
<p>sheets_loaded: Queries a batch of Rows for each Sheet in the Spreadsheet.
Each query uses the <code>values</code> endpoint, to get data-only, w/out the formatting/type metadata.</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="n">SHEETS_LOADED</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"api"</span><span class="p">:</span> <span class="s2">"sheets"</span><span class="p">,</span>
<span class="s2">"path"</span><span class="p">:</span> <span class="s2">"spreadsheets/</span><span class="si">{spreadsheet_id}</span><span class="s2">/values/'</span><span class="si">{sheet_title}</span><span class="s2">'!</span><span class="si">{range_rows}</span><span class="s2">"</span><span class="p">,</span>
<span class="s2">"data_key"</span><span class="p">:</span> <span class="s2">"values"</span><span class="p">,</span>
<span class="s2">"key_properties"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"spreadsheetId"</span><span class="p">,</span> <span class="s2">"sheetId"</span><span class="p">,</span> <span class="s2">"loadDate"</span><span class="p">],</span>
<span class="s2">"replication_method"</span><span class="p">:</span> <span class="s2">"FULL_TABLE"</span><span class="p">,</span>
<span class="s2">"params"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"dateTimeRenderOption"</span><span class="p">:</span> <span class="s2">"SERIAL_NUMBER"</span><span class="p">,</span>
<span class="s2">"valueRenderOption"</span><span class="p">:</span> <span class="s2">"UNFORMATTED_VALUE"</span><span class="p">,</span>
<span class="s2">"majorDimension"</span><span class="p">:</span> <span class="s2">"ROWS"</span>
<span class="p">}</span>
<span class="p">}</span></pre></div>
</div>
</div>
<div class='clearall'></div>
<div class='section' id='section-6'>
<div class='docs'>
<div class='octowrap'>
<a class='octothorpe' href='#section-6'>#</a>
</div>
<p>Ensure streams are ordered sequentially, logically.</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="n">STREAMS</span> <span class="o">=</span> <span class="n">OrderedDict</span><span class="p">()</span>
<span class="n">STREAMS</span><span class="p">[</span><span class="s1">'file_metadata'</span><span class="p">]</span> <span class="o">=</span> <span class="n">FILE_METADATA</span>
<span class="n">STREAMS</span><span class="p">[</span><span class="s1">'spreadsheet_metadata'</span><span class="p">]</span> <span class="o">=</span> <span class="n">SPREADSHEET_METADATA</span>
<span class="n">STREAMS</span><span class="p">[</span><span class="s1">'sheet_metadata'</span><span class="p">]</span> <span class="o">=</span> <span class="n">SHEET_METADATA</span>
<span class="n">STREAMS</span><span class="p">[</span><span class="s1">'sheets_loaded'</span><span class="p">]</span> <span class="o">=</span> <span class="n">SHEETS_LOADED</span>
</pre></div>
</div>
</div>
<div class='clearall'></div>
</div>
</body>