]>
Commit | Line | Data |
---|---|---|
1 | {-# OPTIONS_GHC -fno-warn-unused-imports #-} | |
2 | ||
3 | module Pipes.Text.Tutorial ( | |
4 | -- * Effectful Text | |
5 | -- $intro | |
6 | -- ** @Pipes.Text@ | |
7 | -- $pipestext | |
8 | -- ** @Pipes.Text.IO@ | |
9 | -- $pipestextio | |
10 | -- ** @Pipes.Text.Encoding@ | |
11 | -- $pipestextencoding | |
12 | -- * Lenses | |
13 | -- $lenses | |
14 | ||
15 | -- ** @view@ \/ @(^.)@ | |
16 | -- $view | |
17 | ||
18 | -- ** @over@ \/ @(%~)@ | |
19 | -- $over | |
20 | ||
21 | -- ** @zoom@ | |
22 | -- $zoom | |
23 | ||
24 | -- * Special types: @Producer Text m (Producer Text m r)@ and @FreeT (Producer Text m) m r@ | |
25 | -- $special | |
26 | ) where | |
27 | ||
28 | import Pipes | |
29 | import Pipes.Text | |
30 | import Pipes.Text.IO | |
31 | import Pipes.Text.Encoding | |
32 | ||
33 | {- $intro | |
34 | This package provides @pipes@ utilities for /character streams/, | |
35 | realized as streams of 'Text' chunks. The individual chunks are uniformly /strict/, | |
36 | and thus the @Text@ type we are using is the one from @Data.Text@, not @Data.Text.Lazy@ | |
37 | But the type @Producer Text m r@, as we are using it, is a sort of /pipes/ equivalent of | |
38 | the lazy @Text@ type. | |
39 | ||
40 | The main @Pipes.Text@ module provides many functions equivalent | |
41 | in one way or another to the pure functions in | |
42 | <https://hackage.haskell.org/package/text-1.1.0.0/docs/Data-Text-Lazy.html Data.Text.Lazy> | |
43 | (and the corresponding @Prelude@ functions for @String@ s): they transform, | |
44 | divide, group and fold text streams. Though @Producer Text m r@ | |
45 | is the type of \'effectful Text\', the functions in @Pipes.Text@ are \'pure\' | |
46 | in the sense that they are uniformly monad-independent. | |
47 | Simple /IO/ operations are defined in @Pipes.Text.IO@ - as lazy IO @Text@ | |
48 | operations are in @Data.Text.Lazy.IO@. Similarly, as @Data.Text.Lazy.Encoding@ | |
49 | handles inter-operation with @Data.ByteString.Lazy@, @Pipes.Text.Encoding@ provides for | |
50 | interoperation with the \'effectful ByteStrings\' of @Pipes.ByteString@. | |
51 | ||
52 | Remember that the @Text@ type exported by @Data.Text.Lazy@ is basically | |
53 | that of a lazy list of strict @Text@: the implementation is arranged so that | |
54 | the individual strict 'Text' chunks are kept to a reasonable size; the user | |
55 | is not aware of the divisions between the connected 'Text' chunks, but uses | |
56 | operations akin to those for strict text. | |
57 | So also here: the functions in this module are designed to operate on character streams that | |
58 | in a way that is independent of the boundaries of the underlying @Text@ chunks. | |
59 | This means that they may freely split text into smaller texts and /discard empty texts/. | |
60 | The objective, though, is that they should not /concatenate texts/ in order to provide strict upper | |
61 | bounds on memory usage. | |
62 | ||
63 | For example, to stream only the first three lines of 'stdin' to 'stdout' you | |
64 | might write: | |
65 | ||
66 | > import Pipes | |
67 | > import qualified Pipes.Text as Text | |
68 | > import qualified Pipes.Text.IO as Text | |
69 | > import Pipes.Group (takes') | |
70 | > import Lens.Family (view) | |
71 | > | |
72 | > main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout | |
73 | > where | |
74 | > takeLines n = view Text.unlines . takes' n . view Text.lines | |
75 | ||
76 | This program will never bring more into memory than what @Text.stdin@ considers | |
77 | one chunk of text (~ 32 KB), even if individual lines are split across many chunks. | |
78 | ||
79 | -} | |
80 | {- $lenses | |
81 | As the use of @view@ in this example shows, one superficial difference from @Data.Text.Lazy@ | |
82 | is that many of the operations, like 'lines', are \'lensified\'; this has a | |
83 | number of advantages; in particular it facilitates their use with 'Parser's of Text | |
84 | (in the general <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html pipes-parse> | |
85 | sense.) The remarks that follow in this section are for non-lens adepts. | |
86 | ||
87 | Each lens exported here, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the | |
88 | intuitively corresponding function when used with @view@ or @(^.)@. Instead of | |
89 | writing: | |
90 | ||
91 | > splitAt 17 producer | |
92 | ||
93 | as we would with the Prelude or Text functions, we write | |
94 | ||
95 | > view (splitAt 17) producer | |
96 | ||
97 | or equivalently | |
98 | ||
99 | > producer ^. splitAt 17 | |
100 | ||
101 | This may seem a little indirect, but note that many equivalents of | |
102 | @Text -> Text@ functions are exported here as 'Pipe's. Here too we recover the intuitively | |
103 | corresponding functions by prefixing them with @(>->)@. Thus something like | |
104 | ||
105 | > stripLines = view Text.unlines . Group.maps (>-> Text.stripStart) . view Text.lines | |
106 | ||
107 | would drop the leading white space from each line. | |
108 | ||
109 | The lenses in this library are marked as /improper/; this just means that | |
110 | they don't admit all the operations of an ideal lens, but only /getting/ and /focusing/. | |
111 | Just for this reason, though, the magnificent complexities of the lens libraries | |
112 | are a distraction. The lens combinators to keep in mind, the ones that make sense for | |
113 | our lenses, are @view@ \/ @(^.)@), @over@ \/ @(%~)@ , and @zoom@. | |
114 | ||
115 | One need only keep in mind that if @l@ is a @Lens' a b@, then: | |
116 | ||
117 | -} | |
118 | {- $view | |
119 | @view l@ is a function @a -> b@ . Thus @view l a@ (also written @a ^. l@ ) | |
120 | is the corresponding @b@; as was said above, this function will typically be | |
121 | the pipes equivalent of the function you think it is, given its name. So for example | |
122 | ||
123 | > view (Text.drop) | |
124 | > view (Text.splitAt 300) :: Producer Text m r -> Producer Text (Producer Text m r) | |
125 | > Text.stdin ^. splitAt 300 :: Producer Text IO (Producer Text IO r) | |
126 | ||
127 | I.e., it produces the first 300 characters, and returns the rest of the producer. | |
128 | Thus to uppercase the first n characters | |
129 | of a Producer, leaving the rest the same, we could write: | |
130 | ||
131 | ||
132 | > upper n p = do p' <- p ^. Text.splitAt n >-> Text.toUpper | |
133 | > p' | |
134 | -} | |
135 | {- $over | |
136 | @over l@ is a function @(b -> b) -> a -> a@. Thus, given a function that modifies | |
137 | @b@s, the lens lets us modify an @a@ by applying @f :: b -> b@ to | |
138 | the @b@ that we can \"see\" through the lens. So @over l f :: a -> a@ | |
139 | (it can also be written @l %~ f@). | |
140 | For any particular @a@, then, @over l f a@ or @(l %~ f) a@ is a revised @a@. | |
141 | So above we might have written things like these: | |
142 | ||
143 | > stripLines = Text.lines %~ maps (>-> Text.stripStart) | |
144 | > stripLines = over Text.lines (maps (>-> Text.stripStart)) | |
145 | > upper n = Text.splitAt n %~ (>-> Text.toUpper) | |
146 | ||
147 | -} | |
148 | {- $zoom | |
149 | @zoom l@, finally, is a function from a @Parser b m r@ | |
150 | to a @Parser a m r@ (or more generally a @StateT (Producer b m x) m r@). | |
151 | Its use is easiest to see with an decoding lens like 'utf8', which | |
152 | \"sees\" a Text producer hidden inside a ByteString producer: | |
153 | @drawChar@ is a Text parser, returning a @Maybe Char@, @zoom utf8 drawChar@ is | |
154 | a /ByteString/ parser, returning a @Maybe Char@. @drawAll@ is a Parser that returns | |
155 | a list of everything produced from a Producer, leaving only the return value; it would | |
156 | usually be unreasonable to use it. But @zoom (splitAt 17) drawAll@ | |
157 | returns a list of Text chunks containing the first seventeen Chars, and returns the rest of | |
158 | the Text Producer for further parsing. Suppose that we want, inexplicably, to | |
159 | modify the casing of a Text Producer according to any instruction it might | |
160 | contain at the start. Then we might write something like this: | |
161 | ||
162 | > obey :: Monad m => Producer Text m b -> Producer Text m b | |
163 | > obey p = do (ts, p') <- lift $ runStateT (zoom (Text.splitAt 7) drawAll) p | |
164 | > let seven = T.concat ts | |
165 | > case T.toUpper seven of | |
166 | > "TOUPPER" -> p' >-> Text.toUpper | |
167 | > "TOLOWER" -> p' >-> Text.toLower | |
168 | > _ -> do yield seven | |
169 | > p' | |
170 | ||
171 | ||
172 | > >>> let doc = each ["toU","pperTh","is document.\n"] | |
173 | > >>> runEffect $ obey doc >-> Text.stdout | |
174 | > THIS DOCUMENT. | |
175 | ||
176 | The purpose of exporting lenses is the mental economy achieved with this three-way | |
177 | applicability. That one expression, e.g. @lines@ or @splitAt 17@ can have these | |
178 | three uses is no more surprising than that a pipe can act as a function modifying | |
179 | the output of a producer, namely by using @>->@ to its left: @producer >-> pipe@ | |
180 | -- but can /also/ modify the inputs to a consumer by using @>->@ to its right: | |
181 | @pipe >-> consumer@ | |
182 | ||
183 | The three functions, @view@ \/ @(^.)@, @over@ \/ @(%~)@ and @zoom@ are supplied by | |
184 | both <http://hackage.haskell.org/package/lens lens> and | |
185 | <http://hackage.haskell.org/package/lens-family lens-family> The use of 'zoom' is explained | |
186 | in <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html Pipes.Parse.Tutorial> | |
187 | and to some extent in the @Pipes.Text.Encoding@ module here. | |
188 | ||
189 | -} | |
190 | {- $special | |
191 | These simple 'lines' examples reveal a more important difference from @Data.Text.Lazy@ . | |
192 | This is in the types that are most closely associated with our central text type, | |
193 | @Producer Text m r@. In @Data.Text@ and @Data.Text.Lazy@ we find functions like | |
194 | ||
195 | > splitAt :: Int -> Text -> (Text, Text) | |
196 | > lines :: Text -> [Text] | |
197 | > chunksOf :: Int -> Text -> [Text] | |
198 | ||
199 | which relate a Text with a pair of Texts or a list of Texts. | |
200 | The corresponding functions here (taking account of \'lensification\') are | |
201 | ||
202 | > view . splitAt :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text m (Producer Text m r) | |
203 | > view lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r | |
204 | > view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r | |
205 | ||
206 | Some of the types may be more readable if you imagine that we have introduced | |
207 | our own type synonyms | |
208 | ||
209 | > type Text m r = Producer T.Text m r | |
210 | > type Texts m r = FreeT (Producer T.Text m) m r | |
211 | ||
212 | Then we would think of the types above as | |
213 | ||
214 | > view . splitAt :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r) | |
215 | > view lines :: (Monad m) => Text m r -> Texts m r | |
216 | > view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r | |
217 | ||
218 | which brings one closer to the types of the similar functions in @Data.Text.Lazy@ | |
219 | ||
220 | In the type @Producer Text m (Producer Text m r)@ the second | |
221 | element of the \'pair\' of effectful Texts cannot simply be retrieved | |
222 | with something like 'snd'. This is an \'effectful\' pair, and one must work | |
223 | through the effects of the first element to arrive at the second Text stream, even | |
224 | if you are proposing to throw the Text in the first element away. | |
225 | Note that we use Control.Monad.join to fuse the pair back together, since it specializes to | |
226 | ||
227 | > join :: Monad m => Producer Text m (Producer m r) -> Producer m r | |
228 | ||
229 | The return type of 'lines', 'words', 'chunksOf' and the other /splitter/ functions, | |
230 | @FreeT (Producer m Text) m r@ -- our @Texts m r@ -- is the type of (effectful) | |
231 | lists of (effectful) texts. The type @([Text],r)@ might be seen to gather | |
232 | together things of the forms: | |
233 | ||
234 | > r | |
235 | > (Text,r) | |
236 | > (Text, (Text, r)) | |
237 | > (Text, (Text, (Text, r))) | |
238 | > (Text, (Text, (Text, (Text, r)))) | |
239 | > ... | |
240 | ||
241 | (We might also have identified the sum of those types with @Free ((,) Text) r@ | |
242 | -- or, more absurdly, @FreeT ((,) Text) Identity r@.) | |
243 | ||
244 | Similarly, our type @Texts m r@, or @FreeT (Text m) m r@ -- in fact called | |
245 | @FreeT (Producer Text m) m r@ here -- encompasses all the members of the sequence: | |
246 | ||
247 | > m r | |
248 | > Text m r | |
249 | > Text m (Text m r) | |
250 | > Text m (Text m (Text m r)) | |
251 | > Text m (Text m (Text m (Text m r))) | |
252 | > ... | |
253 | ||
254 | We might have used a more specialized type in place of @FreeT (Producer a m) m r@, | |
255 | or indeed of @FreeT (Producer Text m) m r@, but it is clear that the correct | |
256 | result type of 'lines' will be isomorphic to @FreeT (Producer Text m) m r@ . | |
257 | ||
258 | One might think that | |
259 | ||
260 | > lines :: Monad m => Lens' (Producer Text m r) (FreeT (Producer Text m) m r) | |
261 | > view . lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r | |
262 | ||
263 | should really have the type | |
264 | ||
265 | > lines :: Monad m => Pipe Text Text m r | |
266 | ||
267 | as e.g. 'toUpper' does. But this would spoil the control we are | |
268 | attempting to maintain over the size of chunks. It is in fact just | |
269 | as unreasonable to want such a pipe as to want | |
270 | ||
271 | > Data.Text.Lazy.lines :: Text -> Text | |
272 | ||
273 | to 'rechunk' the strict Text chunks inside the lazy Text to respect | |
274 | line boundaries. In fact we have | |
275 | ||
276 | > Data.Text.Lazy.lines :: Text -> [Text] | |
277 | > Prelude.lines :: String -> [String] | |
278 | ||
279 | where the elements of the list are themselves lazy Texts or Strings; the use | |
280 | of @FreeT (Producer Text m) m r@ is simply the 'effectful' version of this. | |
281 | ||
282 | The @Pipes.Group@ module, which can generally be imported without qualification, | |
283 | provides many functions for working with things of type @FreeT (Producer a m) m r@. | |
284 | In particular it conveniently exports the constructors for @FreeT@ and the associated | |
285 | @FreeF@ type -- a fancy form of @Either@, namely | |
286 | ||
287 | > data FreeF f a b = Pure a | Free (f b) | |
288 | ||
289 | for pattern-matching. Consider the implementation of the 'words' function, or | |
290 | of the part of the lens that takes us to the words; it is compact but exhibits many | |
291 | of the points under discussion, including explicit handling of the @FreeT@ and @FreeF@ | |
292 | constuctors. Keep in mind that | |
293 | ||
294 | > newtype FreeT f m a = FreeT (m (FreeF f a (FreeT f m a))) | |
295 | > next :: Monad m => Producer a m r -> m (Either r (a, Producer a m r)) | |
296 | ||
297 | Thus the @do@ block after the @FreeT@ constructor is in the base monad, e.g. 'IO' or 'Identity'; | |
298 | the later subordinate block, opened by the @Free@ constructor, is in the @Producer@ monad: | |
299 | ||
300 | > words :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r | |
301 | > words p = FreeT $ do -- With 'next' we will inspect p's first chunk, excluding spaces; | |
302 | > x <- next (p >-> dropWhile isSpace) -- note that 'dropWhile isSpace' is a pipe, and is thus *applied* with '>->'. | |
303 | > return $ case x of -- We use 'return' and so need something of type 'FreeF (Text m) r (Texts m r)' | |
304 | > Left r -> Pure r -- 'Left' means we got no Text chunk, but only the return value; so we are done. | |
305 | > Right (txt, p') -> Free $ do -- If we get a chunk and the rest of the producer, p', we enter the 'Producer' monad | |
306 | > p'' <- view (break isSpace) -- When we apply 'break isSpace', we get a Producer that returns a Producer; | |
307 | > (yield txt >> p') -- so here we yield everything up to the next space, and get the rest back. | |
308 | > return (words p'') -- We then carry on with the rest, which is likely to begin with space. | |
309 | ||
310 | -} |