]>
Commit | Line | Data |
---|---|---|
955edd33 | 1 | {-# OPTIONS_GHC -fno-warn-unused-imports #-} |
2 | ||
3 | module Pipes.Text.Tutorial ( | |
4 | -- * Effectful Text | |
5 | -- $intro | |
b091cbeb | 6 | |
955edd33 | 7 | -- ** @Pipes.Text@ |
8 | -- $pipestext | |
b091cbeb | 9 | |
955edd33 | 10 | -- ** @Pipes.Text.IO@ |
11 | -- $pipestextio | |
b091cbeb | 12 | |
955edd33 | 13 | -- ** @Pipes.Text.Encoding@ |
14 | -- $pipestextencoding | |
b091cbeb | 15 | |
16 | -- ** Implicit chunking | |
17 | -- $chunks | |
18 | ||
955edd33 | 19 | -- * Lenses |
20 | -- $lenses | |
21 | ||
22 | -- ** @view@ \/ @(^.)@ | |
23 | -- $view | |
24 | ||
25 | -- ** @over@ \/ @(%~)@ | |
26 | -- $over | |
27 | ||
28 | -- ** @zoom@ | |
29 | -- $zoom | |
b091cbeb | 30 | |
31 | ||
32 | ||
955edd33 | 33 | |
34 | -- * Special types: @Producer Text m (Producer Text m r)@ and @FreeT (Producer Text m) m r@ | |
35 | -- $special | |
36 | ) where | |
6c2fffdc | 37 | |
955edd33 | 38 | import Pipes |
39 | import Pipes.Text | |
40 | import Pipes.Text.IO | |
41 | import Pipes.Text.Encoding | |
6c2fffdc | 42 | |
955edd33 | 43 | {- $intro |
44 | This package provides @pipes@ utilities for /character streams/, | |
45 | realized as streams of 'Text' chunks. The individual chunks are uniformly /strict/, | |
c3e8e478 | 46 | and thus the @Text@ type we are using is always the one from @Data.Text@, not @Data.Text.Lazy@ |
47 | The type @Producer Text m r@, as we are using it, is a sort of /pipes/ equivalent of | |
955edd33 | 48 | the lazy @Text@ type. |
b091cbeb | 49 | -} |
955edd33 | 50 | |
b091cbeb | 51 | {- $pipestext |
955edd33 | 52 | The main @Pipes.Text@ module provides many functions equivalent |
53 | in one way or another to the pure functions in | |
54 | <https://hackage.haskell.org/package/text-1.1.0.0/docs/Data-Text-Lazy.html Data.Text.Lazy> | |
55 | (and the corresponding @Prelude@ functions for @String@ s): they transform, | |
56 | divide, group and fold text streams. Though @Producer Text m r@ | |
57 | is the type of \'effectful Text\', the functions in @Pipes.Text@ are \'pure\' | |
58 | in the sense that they are uniformly monad-independent. | |
b091cbeb | 59 | -} |
60 | ||
61 | {- $pipestextencoding | |
62 | In the @text@ library, @Data.Text.Lazy.Encoding@ | |
c3e8e478 | 63 | handles inter-operation with @Data.ByteString.Lazy@. Similarly here, @Pipes.Text.Encoding@ |
b091cbeb | 64 | provides for interoperation with the \'effectful ByteStrings\' of @Pipes.ByteString@. |
65 | -} | |
66 | ||
67 | {- $pipestextio | |
955edd33 | 68 | Simple /IO/ operations are defined in @Pipes.Text.IO@ - as lazy IO @Text@ |
c3e8e478 | 69 | operations are in @Data.Text.Lazy.IO@. There are also some simple line-based operations |
70 | in @Pipes.Prelude.Text@. The latter do not depend on the conception of effectful text | |
71 | implemented elsewhere in this package, but just improve on the @stdinLn@ and @writeFile@ of | |
72 | @Pipes.Prelude@ and @Pipes.Safe.Prelude@ by replacing 'String' with 'Text' | |
b091cbeb | 73 | -} |
955edd33 | 74 | |
b091cbeb | 75 | |
76 | {- $chunks | |
955edd33 | 77 | Remember that the @Text@ type exported by @Data.Text.Lazy@ is basically |
78 | that of a lazy list of strict @Text@: the implementation is arranged so that | |
79 | the individual strict 'Text' chunks are kept to a reasonable size; the user | |
80 | is not aware of the divisions between the connected 'Text' chunks, but uses | |
81 | operations akin to those for strict text. | |
b091cbeb | 82 | |
83 | So also here: the operations in @Pipes.Text@ are designed to operate on character streams that | |
955edd33 | 84 | in a way that is independent of the boundaries of the underlying @Text@ chunks. |
85 | This means that they may freely split text into smaller texts and /discard empty texts/. | |
86 | The objective, though, is that they should not /concatenate texts/ in order to provide strict upper | |
5dca0cf2 | 87 | bounds on memory usage even for indefinitely complex compositions. |
955edd33 | 88 | |
89 | For example, to stream only the first three lines of 'stdin' to 'stdout' you | |
90 | might write: | |
91 | ||
92 | > import Pipes | |
93 | > import qualified Pipes.Text as Text | |
94 | > import qualified Pipes.Text.IO as Text | |
95 | > import Pipes.Group (takes') | |
5dca0cf2 | 96 | > import Lens.Family (view, over) -- or `Lens.Micro.Mtl` or `Control.Lens` or etc. |
955edd33 | 97 | > |
98 | > main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout | |
b091cbeb | 99 | > where |
955edd33 | 100 | > takeLines n = view Text.unlines . takes' n . view Text.lines |
5dca0cf2 | 101 | > -- or equivalently: over Text.unlines (takes' n) |
955edd33 | 102 | |
b091cbeb | 103 | This program will not bring more into memory than what @Text.stdin@ considers |
104 | one chunk of text (~ 32 KB), even if individual lines are split | |
105 | across many chunks. The division into lines does not join Text fragments. | |
955edd33 | 106 | |
107 | -} | |
b091cbeb | 108 | |
109 | ||
955edd33 | 110 | {- $lenses |
111 | As the use of @view@ in this example shows, one superficial difference from @Data.Text.Lazy@ | |
112 | is that many of the operations, like 'lines', are \'lensified\'; this has a | |
113 | number of advantages; in particular it facilitates their use with 'Parser's of Text | |
114 | (in the general <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html pipes-parse> | |
115 | sense.) The remarks that follow in this section are for non-lens adepts. | |
116 | ||
117 | Each lens exported here, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the | |
118 | intuitively corresponding function when used with @view@ or @(^.)@. Instead of | |
119 | writing: | |
120 | ||
121 | > splitAt 17 producer | |
122 | ||
b091cbeb | 123 | as we would with the Prelude or Text functions called @splitAt@, we write |
955edd33 | 124 | |
125 | > view (splitAt 17) producer | |
126 | ||
127 | or equivalently | |
128 | ||
129 | > producer ^. splitAt 17 | |
130 | ||
131 | This may seem a little indirect, but note that many equivalents of | |
132 | @Text -> Text@ functions are exported here as 'Pipe's. Here too we recover the intuitively | |
133 | corresponding functions by prefixing them with @(>->)@. Thus something like | |
134 | ||
135 | > stripLines = view Text.unlines . Group.maps (>-> Text.stripStart) . view Text.lines | |
136 | ||
137 | would drop the leading white space from each line. | |
138 | ||
139 | The lenses in this library are marked as /improper/; this just means that | |
140 | they don't admit all the operations of an ideal lens, but only /getting/ and /focusing/. | |
141 | Just for this reason, though, the magnificent complexities of the lens libraries | |
142 | are a distraction. The lens combinators to keep in mind, the ones that make sense for | |
b091cbeb | 143 | our lenses, are @view@, @over@, and @zoom@. |
955edd33 | 144 | |
c3e8e478 | 145 | One need only keep in mind that if @l@ is a @Lens' a b@, then the action of the |
146 | leading operations, @view@, @over@, and @zoom@ are as follows: | |
955edd33 | 147 | |
148 | -} | |
149 | {- $view | |
150 | @view l@ is a function @a -> b@ . Thus @view l a@ (also written @a ^. l@ ) | |
151 | is the corresponding @b@; as was said above, this function will typically be | |
152 | the pipes equivalent of the function you think it is, given its name. So for example | |
153 | ||
955edd33 | 154 | > view (Text.splitAt 300) :: Producer Text m r -> Producer Text (Producer Text m r) |
155 | > Text.stdin ^. splitAt 300 :: Producer Text IO (Producer Text IO r) | |
156 | ||
157 | I.e., it produces the first 300 characters, and returns the rest of the producer. | |
158 | Thus to uppercase the first n characters | |
159 | of a Producer, leaving the rest the same, we could write: | |
160 | ||
955edd33 | 161 | > upper n p = do p' <- p ^. Text.splitAt n >-> Text.toUpper |
162 | > p' | |
b091cbeb | 163 | |
164 | or equivalently: | |
165 | ||
166 | > upper n p = join (p ^. Text.splitAt n >-> Text.toUpper) | |
167 | ||
955edd33 | 168 | -} |
169 | {- $over | |
b091cbeb | 170 | If @l@ is a @Lens a b@, @over l@ is a function @(b -> b) -> a -> a@. |
171 | Thus, given a function that modifies | |
955edd33 | 172 | @b@s, the lens lets us modify an @a@ by applying @f :: b -> b@ to |
b091cbeb | 173 | the @b@ that we \"see\" in the @a@ through the lens. |
174 | So the type of @over l f@ is @a -> a@ for the concrete type @a@ | |
955edd33 | 175 | (it can also be written @l %~ f@). |
176 | For any particular @a@, then, @over l f a@ or @(l %~ f) a@ is a revised @a@. | |
177 | So above we might have written things like these: | |
178 | ||
955edd33 | 179 | > stripLines = over Text.lines (maps (>-> Text.stripStart)) |
b091cbeb | 180 | > stripLines = Text.lines %~ maps (>-> Text.stripStart) |
955edd33 | 181 | > upper n = Text.splitAt n %~ (>-> Text.toUpper) |
955edd33 | 182 | -} |
b091cbeb | 183 | |
955edd33 | 184 | {- $zoom |
185 | @zoom l@, finally, is a function from a @Parser b m r@ | |
186 | to a @Parser a m r@ (or more generally a @StateT (Producer b m x) m r@). | |
187 | Its use is easiest to see with an decoding lens like 'utf8', which | |
188 | \"sees\" a Text producer hidden inside a ByteString producer: | |
189 | @drawChar@ is a Text parser, returning a @Maybe Char@, @zoom utf8 drawChar@ is | |
190 | a /ByteString/ parser, returning a @Maybe Char@. @drawAll@ is a Parser that returns | |
191 | a list of everything produced from a Producer, leaving only the return value; it would | |
192 | usually be unreasonable to use it. But @zoom (splitAt 17) drawAll@ | |
193 | returns a list of Text chunks containing the first seventeen Chars, and returns the rest of | |
194 | the Text Producer for further parsing. Suppose that we want, inexplicably, to | |
195 | modify the casing of a Text Producer according to any instruction it might | |
196 | contain at the start. Then we might write something like this: | |
197 | ||
198 | > obey :: Monad m => Producer Text m b -> Producer Text m b | |
199 | > obey p = do (ts, p') <- lift $ runStateT (zoom (Text.splitAt 7) drawAll) p | |
200 | > let seven = T.concat ts | |
201 | > case T.toUpper seven of | |
202 | > "TOUPPER" -> p' >-> Text.toUpper | |
203 | > "TOLOWER" -> p' >-> Text.toLower | |
204 | > _ -> do yield seven | |
205 | > p' | |
206 | ||
207 | ||
b091cbeb | 208 | > -- > let doc = each ["toU","pperTh","is document.\n"] |
209 | > -- > runEffect $ obey doc >-> Text.stdout | |
210 | > -- THIS DOCUMENT. | |
955edd33 | 211 | |
212 | The purpose of exporting lenses is the mental economy achieved with this three-way | |
213 | applicability. That one expression, e.g. @lines@ or @splitAt 17@ can have these | |
214 | three uses is no more surprising than that a pipe can act as a function modifying | |
215 | the output of a producer, namely by using @>->@ to its left: @producer >-> pipe@ | |
216 | -- but can /also/ modify the inputs to a consumer by using @>->@ to its right: | |
217 | @pipe >-> consumer@ | |
218 | ||
219 | The three functions, @view@ \/ @(^.)@, @over@ \/ @(%~)@ and @zoom@ are supplied by | |
220 | both <http://hackage.haskell.org/package/lens lens> and | |
221 | <http://hackage.haskell.org/package/lens-family lens-family> The use of 'zoom' is explained | |
222 | in <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html Pipes.Parse.Tutorial> | |
223 | and to some extent in the @Pipes.Text.Encoding@ module here. | |
224 | ||
225 | -} | |
b091cbeb | 226 | |
955edd33 | 227 | {- $special |
b091cbeb | 228 | The simple programs using the 'lines' lens reveal a more important difference from @Data.Text.Lazy@ . |
955edd33 | 229 | This is in the types that are most closely associated with our central text type, |
230 | @Producer Text m r@. In @Data.Text@ and @Data.Text.Lazy@ we find functions like | |
231 | ||
232 | > splitAt :: Int -> Text -> (Text, Text) | |
233 | > lines :: Text -> [Text] | |
234 | > chunksOf :: Int -> Text -> [Text] | |
235 | ||
236 | which relate a Text with a pair of Texts or a list of Texts. | |
237 | The corresponding functions here (taking account of \'lensification\') are | |
238 | ||
239 | > view . splitAt :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text m (Producer Text m r) | |
240 | > view lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r | |
241 | > view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r | |
242 | ||
243 | Some of the types may be more readable if you imagine that we have introduced | |
244 | our own type synonyms | |
245 | ||
246 | > type Text m r = Producer T.Text m r | |
247 | > type Texts m r = FreeT (Producer T.Text m) m r | |
248 | ||
249 | Then we would think of the types above as | |
250 | ||
251 | > view . splitAt :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r) | |
252 | > view lines :: (Monad m) => Text m r -> Texts m r | |
253 | > view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r | |
254 | ||
255 | which brings one closer to the types of the similar functions in @Data.Text.Lazy@ | |
256 | ||
257 | In the type @Producer Text m (Producer Text m r)@ the second | |
258 | element of the \'pair\' of effectful Texts cannot simply be retrieved | |
259 | with something like 'snd'. This is an \'effectful\' pair, and one must work | |
260 | through the effects of the first element to arrive at the second Text stream, even | |
261 | if you are proposing to throw the Text in the first element away. | |
262 | Note that we use Control.Monad.join to fuse the pair back together, since it specializes to | |
263 | ||
264 | > join :: Monad m => Producer Text m (Producer m r) -> Producer m r | |
265 | ||
266 | The return type of 'lines', 'words', 'chunksOf' and the other /splitter/ functions, | |
267 | @FreeT (Producer m Text) m r@ -- our @Texts m r@ -- is the type of (effectful) | |
268 | lists of (effectful) texts. The type @([Text],r)@ might be seen to gather | |
269 | together things of the forms: | |
270 | ||
271 | > r | |
272 | > (Text,r) | |
273 | > (Text, (Text, r)) | |
274 | > (Text, (Text, (Text, r))) | |
275 | > (Text, (Text, (Text, (Text, r)))) | |
276 | > ... | |
277 | ||
278 | (We might also have identified the sum of those types with @Free ((,) Text) r@ | |
279 | -- or, more absurdly, @FreeT ((,) Text) Identity r@.) | |
280 | ||
281 | Similarly, our type @Texts m r@, or @FreeT (Text m) m r@ -- in fact called | |
282 | @FreeT (Producer Text m) m r@ here -- encompasses all the members of the sequence: | |
283 | ||
284 | > m r | |
285 | > Text m r | |
286 | > Text m (Text m r) | |
287 | > Text m (Text m (Text m r)) | |
288 | > Text m (Text m (Text m (Text m r))) | |
289 | > ... | |
290 | ||
291 | We might have used a more specialized type in place of @FreeT (Producer a m) m r@, | |
292 | or indeed of @FreeT (Producer Text m) m r@, but it is clear that the correct | |
293 | result type of 'lines' will be isomorphic to @FreeT (Producer Text m) m r@ . | |
294 | ||
295 | One might think that | |
296 | ||
297 | > lines :: Monad m => Lens' (Producer Text m r) (FreeT (Producer Text m) m r) | |
298 | > view . lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r | |
299 | ||
300 | should really have the type | |
301 | ||
302 | > lines :: Monad m => Pipe Text Text m r | |
303 | ||
304 | as e.g. 'toUpper' does. But this would spoil the control we are | |
305 | attempting to maintain over the size of chunks. It is in fact just | |
306 | as unreasonable to want such a pipe as to want | |
307 | ||
308 | > Data.Text.Lazy.lines :: Text -> Text | |
309 | ||
310 | to 'rechunk' the strict Text chunks inside the lazy Text to respect | |
311 | line boundaries. In fact we have | |
312 | ||
313 | > Data.Text.Lazy.lines :: Text -> [Text] | |
314 | > Prelude.lines :: String -> [String] | |
315 | ||
316 | where the elements of the list are themselves lazy Texts or Strings; the use | |
317 | of @FreeT (Producer Text m) m r@ is simply the 'effectful' version of this. | |
318 | ||
319 | The @Pipes.Group@ module, which can generally be imported without qualification, | |
320 | provides many functions for working with things of type @FreeT (Producer a m) m r@. | |
321 | In particular it conveniently exports the constructors for @FreeT@ and the associated | |
322 | @FreeF@ type -- a fancy form of @Either@, namely | |
323 | ||
324 | > data FreeF f a b = Pure a | Free (f b) | |
325 | ||
326 | for pattern-matching. Consider the implementation of the 'words' function, or | |
327 | of the part of the lens that takes us to the words; it is compact but exhibits many | |
328 | of the points under discussion, including explicit handling of the @FreeT@ and @FreeF@ | |
329 | constuctors. Keep in mind that | |
330 | ||
331 | > newtype FreeT f m a = FreeT (m (FreeF f a (FreeT f m a))) | |
332 | > next :: Monad m => Producer a m r -> m (Either r (a, Producer a m r)) | |
333 | ||
334 | Thus the @do@ block after the @FreeT@ constructor is in the base monad, e.g. 'IO' or 'Identity'; | |
335 | the later subordinate block, opened by the @Free@ constructor, is in the @Producer@ monad: | |
336 | ||
337 | > words :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r | |
338 | > words p = FreeT $ do -- With 'next' we will inspect p's first chunk, excluding spaces; | |
339 | > x <- next (p >-> dropWhile isSpace) -- note that 'dropWhile isSpace' is a pipe, and is thus *applied* with '>->'. | |
340 | > return $ case x of -- We use 'return' and so need something of type 'FreeF (Text m) r (Texts m r)' | |
341 | > Left r -> Pure r -- 'Left' means we got no Text chunk, but only the return value; so we are done. | |
342 | > Right (txt, p') -> Free $ do -- If we get a chunk and the rest of the producer, p', we enter the 'Producer' monad | |
343 | > p'' <- view (break isSpace) -- When we apply 'break isSpace', we get a Producer that returns a Producer; | |
344 | > (yield txt >> p') -- so here we yield everything up to the next space, and get the rest back. | |
345 | > return (words p'') -- We then carry on with the rest, which is likely to begin with space. | |
346 | ||
347 | -} |