]>
Commit | Line | Data |
---|---|---|
955edd33 | 1 | {-# OPTIONS_GHC -fno-warn-unused-imports #-} |
2 | ||
3 | module Pipes.Text.Tutorial ( | |
4 | -- * Effectful Text | |
5 | -- $intro | |
b091cbeb | 6 | |
955edd33 | 7 | -- ** @Pipes.Text@ |
8 | -- $pipestext | |
b091cbeb | 9 | |
955edd33 | 10 | -- ** @Pipes.Text.IO@ |
11 | -- $pipestextio | |
b091cbeb | 12 | |
955edd33 | 13 | -- ** @Pipes.Text.Encoding@ |
14 | -- $pipestextencoding | |
b091cbeb | 15 | |
16 | -- ** Implicit chunking | |
17 | -- $chunks | |
18 | ||
955edd33 | 19 | -- * Lenses |
20 | -- $lenses | |
21 | ||
22 | -- ** @view@ \/ @(^.)@ | |
23 | -- $view | |
24 | ||
25 | -- ** @over@ \/ @(%~)@ | |
26 | -- $over | |
27 | ||
28 | -- ** @zoom@ | |
29 | -- $zoom | |
b091cbeb | 30 | |
31 | ||
32 | ||
955edd33 | 33 | |
34 | -- * Special types: @Producer Text m (Producer Text m r)@ and @FreeT (Producer Text m) m r@ | |
35 | -- $special | |
36 | ) where | |
6c2fffdc | 37 | |
955edd33 | 38 | import Pipes |
39 | import Pipes.Text | |
40 | import Pipes.Text.IO | |
41 | import Pipes.Text.Encoding | |
6c2fffdc | 42 | |
955edd33 | 43 | {- $intro |
44 | This package provides @pipes@ utilities for /character streams/, | |
45 | realized as streams of 'Text' chunks. The individual chunks are uniformly /strict/, | |
46 | and thus the @Text@ type we are using is the one from @Data.Text@, not @Data.Text.Lazy@ | |
47 | But the type @Producer Text m r@, as we are using it, is a sort of /pipes/ equivalent of | |
48 | the lazy @Text@ type. | |
b091cbeb | 49 | -} |
955edd33 | 50 | |
b091cbeb | 51 | {- $pipestext |
955edd33 | 52 | The main @Pipes.Text@ module provides many functions equivalent |
53 | in one way or another to the pure functions in | |
54 | <https://hackage.haskell.org/package/text-1.1.0.0/docs/Data-Text-Lazy.html Data.Text.Lazy> | |
55 | (and the corresponding @Prelude@ functions for @String@ s): they transform, | |
56 | divide, group and fold text streams. Though @Producer Text m r@ | |
57 | is the type of \'effectful Text\', the functions in @Pipes.Text@ are \'pure\' | |
58 | in the sense that they are uniformly monad-independent. | |
b091cbeb | 59 | -} |
60 | ||
61 | {- $pipestextencoding | |
62 | In the @text@ library, @Data.Text.Lazy.Encoding@ | |
63 | handles inter-operation with @Data.ByteString.Lazy@. Here, @Pipes.Text.Encoding@ | |
64 | provides for interoperation with the \'effectful ByteStrings\' of @Pipes.ByteString@. | |
65 | -} | |
66 | ||
67 | {- $pipestextio | |
955edd33 | 68 | Simple /IO/ operations are defined in @Pipes.Text.IO@ - as lazy IO @Text@ |
b091cbeb | 69 | operations are in @Data.Text.Lazy.IO@. It is generally |
70 | -} | |
955edd33 | 71 | |
b091cbeb | 72 | |
73 | {- $chunks | |
955edd33 | 74 | Remember that the @Text@ type exported by @Data.Text.Lazy@ is basically |
75 | that of a lazy list of strict @Text@: the implementation is arranged so that | |
76 | the individual strict 'Text' chunks are kept to a reasonable size; the user | |
77 | is not aware of the divisions between the connected 'Text' chunks, but uses | |
78 | operations akin to those for strict text. | |
b091cbeb | 79 | |
80 | So also here: the operations in @Pipes.Text@ are designed to operate on character streams that | |
955edd33 | 81 | in a way that is independent of the boundaries of the underlying @Text@ chunks. |
82 | This means that they may freely split text into smaller texts and /discard empty texts/. | |
83 | The objective, though, is that they should not /concatenate texts/ in order to provide strict upper | |
84 | bounds on memory usage. | |
85 | ||
86 | For example, to stream only the first three lines of 'stdin' to 'stdout' you | |
87 | might write: | |
88 | ||
89 | > import Pipes | |
90 | > import qualified Pipes.Text as Text | |
91 | > import qualified Pipes.Text.IO as Text | |
92 | > import Pipes.Group (takes') | |
b091cbeb | 93 | > import Lens.Family (view, (%~)) -- or, Control.Lens |
955edd33 | 94 | > |
95 | > main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout | |
b091cbeb | 96 | > where |
955edd33 | 97 | > takeLines n = view Text.unlines . takes' n . view Text.lines |
b091cbeb | 98 | > -- or equivalently: Text.unlines %~ takes' n |
955edd33 | 99 | |
b091cbeb | 100 | This program will not bring more into memory than what @Text.stdin@ considers |
101 | one chunk of text (~ 32 KB), even if individual lines are split | |
102 | across many chunks. The division into lines does not join Text fragments. | |
955edd33 | 103 | |
104 | -} | |
b091cbeb | 105 | |
106 | ||
955edd33 | 107 | {- $lenses |
108 | As the use of @view@ in this example shows, one superficial difference from @Data.Text.Lazy@ | |
109 | is that many of the operations, like 'lines', are \'lensified\'; this has a | |
110 | number of advantages; in particular it facilitates their use with 'Parser's of Text | |
111 | (in the general <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html pipes-parse> | |
112 | sense.) The remarks that follow in this section are for non-lens adepts. | |
113 | ||
114 | Each lens exported here, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the | |
115 | intuitively corresponding function when used with @view@ or @(^.)@. Instead of | |
116 | writing: | |
117 | ||
118 | > splitAt 17 producer | |
119 | ||
b091cbeb | 120 | as we would with the Prelude or Text functions called @splitAt@, we write |
955edd33 | 121 | |
122 | > view (splitAt 17) producer | |
123 | ||
124 | or equivalently | |
125 | ||
126 | > producer ^. splitAt 17 | |
127 | ||
128 | This may seem a little indirect, but note that many equivalents of | |
129 | @Text -> Text@ functions are exported here as 'Pipe's. Here too we recover the intuitively | |
130 | corresponding functions by prefixing them with @(>->)@. Thus something like | |
131 | ||
132 | > stripLines = view Text.unlines . Group.maps (>-> Text.stripStart) . view Text.lines | |
133 | ||
134 | would drop the leading white space from each line. | |
135 | ||
136 | The lenses in this library are marked as /improper/; this just means that | |
137 | they don't admit all the operations of an ideal lens, but only /getting/ and /focusing/. | |
138 | Just for this reason, though, the magnificent complexities of the lens libraries | |
139 | are a distraction. The lens combinators to keep in mind, the ones that make sense for | |
b091cbeb | 140 | our lenses, are @view@, @over@, and @zoom@. |
955edd33 | 141 | |
142 | One need only keep in mind that if @l@ is a @Lens' a b@, then: | |
143 | ||
144 | -} | |
145 | {- $view | |
146 | @view l@ is a function @a -> b@ . Thus @view l a@ (also written @a ^. l@ ) | |
147 | is the corresponding @b@; as was said above, this function will typically be | |
148 | the pipes equivalent of the function you think it is, given its name. So for example | |
149 | ||
955edd33 | 150 | > view (Text.splitAt 300) :: Producer Text m r -> Producer Text (Producer Text m r) |
151 | > Text.stdin ^. splitAt 300 :: Producer Text IO (Producer Text IO r) | |
152 | ||
153 | I.e., it produces the first 300 characters, and returns the rest of the producer. | |
154 | Thus to uppercase the first n characters | |
155 | of a Producer, leaving the rest the same, we could write: | |
156 | ||
955edd33 | 157 | > upper n p = do p' <- p ^. Text.splitAt n >-> Text.toUpper |
158 | > p' | |
b091cbeb | 159 | |
160 | or equivalently: | |
161 | ||
162 | > upper n p = join (p ^. Text.splitAt n >-> Text.toUpper) | |
163 | ||
955edd33 | 164 | -} |
165 | {- $over | |
b091cbeb | 166 | If @l@ is a @Lens a b@, @over l@ is a function @(b -> b) -> a -> a@. |
167 | Thus, given a function that modifies | |
955edd33 | 168 | @b@s, the lens lets us modify an @a@ by applying @f :: b -> b@ to |
b091cbeb | 169 | the @b@ that we \"see\" in the @a@ through the lens. |
170 | So the type of @over l f@ is @a -> a@ for the concrete type @a@ | |
955edd33 | 171 | (it can also be written @l %~ f@). |
172 | For any particular @a@, then, @over l f a@ or @(l %~ f) a@ is a revised @a@. | |
173 | So above we might have written things like these: | |
174 | ||
955edd33 | 175 | > stripLines = over Text.lines (maps (>-> Text.stripStart)) |
b091cbeb | 176 | > stripLines = Text.lines %~ maps (>-> Text.stripStart) |
955edd33 | 177 | > upper n = Text.splitAt n %~ (>-> Text.toUpper) |
955edd33 | 178 | -} |
b091cbeb | 179 | |
955edd33 | 180 | {- $zoom |
181 | @zoom l@, finally, is a function from a @Parser b m r@ | |
182 | to a @Parser a m r@ (or more generally a @StateT (Producer b m x) m r@). | |
183 | Its use is easiest to see with an decoding lens like 'utf8', which | |
184 | \"sees\" a Text producer hidden inside a ByteString producer: | |
185 | @drawChar@ is a Text parser, returning a @Maybe Char@, @zoom utf8 drawChar@ is | |
186 | a /ByteString/ parser, returning a @Maybe Char@. @drawAll@ is a Parser that returns | |
187 | a list of everything produced from a Producer, leaving only the return value; it would | |
188 | usually be unreasonable to use it. But @zoom (splitAt 17) drawAll@ | |
189 | returns a list of Text chunks containing the first seventeen Chars, and returns the rest of | |
190 | the Text Producer for further parsing. Suppose that we want, inexplicably, to | |
191 | modify the casing of a Text Producer according to any instruction it might | |
192 | contain at the start. Then we might write something like this: | |
193 | ||
194 | > obey :: Monad m => Producer Text m b -> Producer Text m b | |
195 | > obey p = do (ts, p') <- lift $ runStateT (zoom (Text.splitAt 7) drawAll) p | |
196 | > let seven = T.concat ts | |
197 | > case T.toUpper seven of | |
198 | > "TOUPPER" -> p' >-> Text.toUpper | |
199 | > "TOLOWER" -> p' >-> Text.toLower | |
200 | > _ -> do yield seven | |
201 | > p' | |
202 | ||
203 | ||
b091cbeb | 204 | > -- > let doc = each ["toU","pperTh","is document.\n"] |
205 | > -- > runEffect $ obey doc >-> Text.stdout | |
206 | > -- THIS DOCUMENT. | |
955edd33 | 207 | |
208 | The purpose of exporting lenses is the mental economy achieved with this three-way | |
209 | applicability. That one expression, e.g. @lines@ or @splitAt 17@ can have these | |
210 | three uses is no more surprising than that a pipe can act as a function modifying | |
211 | the output of a producer, namely by using @>->@ to its left: @producer >-> pipe@ | |
212 | -- but can /also/ modify the inputs to a consumer by using @>->@ to its right: | |
213 | @pipe >-> consumer@ | |
214 | ||
215 | The three functions, @view@ \/ @(^.)@, @over@ \/ @(%~)@ and @zoom@ are supplied by | |
216 | both <http://hackage.haskell.org/package/lens lens> and | |
217 | <http://hackage.haskell.org/package/lens-family lens-family> The use of 'zoom' is explained | |
218 | in <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html Pipes.Parse.Tutorial> | |
219 | and to some extent in the @Pipes.Text.Encoding@ module here. | |
220 | ||
221 | -} | |
b091cbeb | 222 | |
955edd33 | 223 | {- $special |
b091cbeb | 224 | The simple programs using the 'lines' lens reveal a more important difference from @Data.Text.Lazy@ . |
955edd33 | 225 | This is in the types that are most closely associated with our central text type, |
226 | @Producer Text m r@. In @Data.Text@ and @Data.Text.Lazy@ we find functions like | |
227 | ||
228 | > splitAt :: Int -> Text -> (Text, Text) | |
229 | > lines :: Text -> [Text] | |
230 | > chunksOf :: Int -> Text -> [Text] | |
231 | ||
232 | which relate a Text with a pair of Texts or a list of Texts. | |
233 | The corresponding functions here (taking account of \'lensification\') are | |
234 | ||
235 | > view . splitAt :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text m (Producer Text m r) | |
236 | > view lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r | |
237 | > view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r | |
238 | ||
239 | Some of the types may be more readable if you imagine that we have introduced | |
240 | our own type synonyms | |
241 | ||
242 | > type Text m r = Producer T.Text m r | |
243 | > type Texts m r = FreeT (Producer T.Text m) m r | |
244 | ||
245 | Then we would think of the types above as | |
246 | ||
247 | > view . splitAt :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r) | |
248 | > view lines :: (Monad m) => Text m r -> Texts m r | |
249 | > view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r | |
250 | ||
251 | which brings one closer to the types of the similar functions in @Data.Text.Lazy@ | |
252 | ||
253 | In the type @Producer Text m (Producer Text m r)@ the second | |
254 | element of the \'pair\' of effectful Texts cannot simply be retrieved | |
255 | with something like 'snd'. This is an \'effectful\' pair, and one must work | |
256 | through the effects of the first element to arrive at the second Text stream, even | |
257 | if you are proposing to throw the Text in the first element away. | |
258 | Note that we use Control.Monad.join to fuse the pair back together, since it specializes to | |
259 | ||
260 | > join :: Monad m => Producer Text m (Producer m r) -> Producer m r | |
261 | ||
262 | The return type of 'lines', 'words', 'chunksOf' and the other /splitter/ functions, | |
263 | @FreeT (Producer m Text) m r@ -- our @Texts m r@ -- is the type of (effectful) | |
264 | lists of (effectful) texts. The type @([Text],r)@ might be seen to gather | |
265 | together things of the forms: | |
266 | ||
267 | > r | |
268 | > (Text,r) | |
269 | > (Text, (Text, r)) | |
270 | > (Text, (Text, (Text, r))) | |
271 | > (Text, (Text, (Text, (Text, r)))) | |
272 | > ... | |
273 | ||
274 | (We might also have identified the sum of those types with @Free ((,) Text) r@ | |
275 | -- or, more absurdly, @FreeT ((,) Text) Identity r@.) | |
276 | ||
277 | Similarly, our type @Texts m r@, or @FreeT (Text m) m r@ -- in fact called | |
278 | @FreeT (Producer Text m) m r@ here -- encompasses all the members of the sequence: | |
279 | ||
280 | > m r | |
281 | > Text m r | |
282 | > Text m (Text m r) | |
283 | > Text m (Text m (Text m r)) | |
284 | > Text m (Text m (Text m (Text m r))) | |
285 | > ... | |
286 | ||
287 | We might have used a more specialized type in place of @FreeT (Producer a m) m r@, | |
288 | or indeed of @FreeT (Producer Text m) m r@, but it is clear that the correct | |
289 | result type of 'lines' will be isomorphic to @FreeT (Producer Text m) m r@ . | |
290 | ||
291 | One might think that | |
292 | ||
293 | > lines :: Monad m => Lens' (Producer Text m r) (FreeT (Producer Text m) m r) | |
294 | > view . lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r | |
295 | ||
296 | should really have the type | |
297 | ||
298 | > lines :: Monad m => Pipe Text Text m r | |
299 | ||
300 | as e.g. 'toUpper' does. But this would spoil the control we are | |
301 | attempting to maintain over the size of chunks. It is in fact just | |
302 | as unreasonable to want such a pipe as to want | |
303 | ||
304 | > Data.Text.Lazy.lines :: Text -> Text | |
305 | ||
306 | to 'rechunk' the strict Text chunks inside the lazy Text to respect | |
307 | line boundaries. In fact we have | |
308 | ||
309 | > Data.Text.Lazy.lines :: Text -> [Text] | |
310 | > Prelude.lines :: String -> [String] | |
311 | ||
312 | where the elements of the list are themselves lazy Texts or Strings; the use | |
313 | of @FreeT (Producer Text m) m r@ is simply the 'effectful' version of this. | |
314 | ||
315 | The @Pipes.Group@ module, which can generally be imported without qualification, | |
316 | provides many functions for working with things of type @FreeT (Producer a m) m r@. | |
317 | In particular it conveniently exports the constructors for @FreeT@ and the associated | |
318 | @FreeF@ type -- a fancy form of @Either@, namely | |
319 | ||
320 | > data FreeF f a b = Pure a | Free (f b) | |
321 | ||
322 | for pattern-matching. Consider the implementation of the 'words' function, or | |
323 | of the part of the lens that takes us to the words; it is compact but exhibits many | |
324 | of the points under discussion, including explicit handling of the @FreeT@ and @FreeF@ | |
325 | constuctors. Keep in mind that | |
326 | ||
327 | > newtype FreeT f m a = FreeT (m (FreeF f a (FreeT f m a))) | |
328 | > next :: Monad m => Producer a m r -> m (Either r (a, Producer a m r)) | |
329 | ||
330 | Thus the @do@ block after the @FreeT@ constructor is in the base monad, e.g. 'IO' or 'Identity'; | |
331 | the later subordinate block, opened by the @Free@ constructor, is in the @Producer@ monad: | |
332 | ||
333 | > words :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r | |
334 | > words p = FreeT $ do -- With 'next' we will inspect p's first chunk, excluding spaces; | |
335 | > x <- next (p >-> dropWhile isSpace) -- note that 'dropWhile isSpace' is a pipe, and is thus *applied* with '>->'. | |
336 | > return $ case x of -- We use 'return' and so need something of type 'FreeF (Text m) r (Texts m r)' | |
337 | > Left r -> Pure r -- 'Left' means we got no Text chunk, but only the return value; so we are done. | |
338 | > Right (txt, p') -> Free $ do -- If we get a chunk and the rest of the producer, p', we enter the 'Producer' monad | |
339 | > p'' <- view (break isSpace) -- When we apply 'break isSpace', we get a Producer that returns a Producer; | |
340 | > (yield txt >> p') -- so here we yield everything up to the next space, and get the rest back. | |
341 | > return (words p'') -- We then carry on with the rest, which is likely to begin with space. | |
342 | ||
343 | -} |