aboutsummaryrefslogtreecommitdiffhomepage
path: root/Pipes/Text.hs
diff options
context:
space:
mode:
Diffstat (limited to 'Pipes/Text.hs')
-rw-r--r--Pipes/Text.hs291
1 files changed, 5 insertions, 286 deletions
diff --git a/Pipes/Text.hs b/Pipes/Text.hs
index 45b9299..7722f7f 100644
--- a/Pipes/Text.hs
+++ b/Pipes/Text.hs
@@ -1,24 +1,11 @@
1{-# LANGUAGE RankNTypes, TypeFamilies, BangPatterns, Trustworthy #-} 1{-# LANGUAGE RankNTypes, TypeFamilies, BangPatterns, Trustworthy #-}
2 2
3module Pipes.Text ( 3{-| The module @Pipes.Text@ closely follows @Pipes.ByteString@ from
4 -- * Effectful Text 4 the @pipes-bytestring@ package. A draft tutorial can be found in
5 -- $intro 5 @Pipes.Text.Tutorial@.
6 6-}
7 -- * Lenses
8 -- $lenses
9
10 -- ** @view@ \/ @(^.)@
11 -- $view
12
13 -- ** @over@ \/ @(%~)@
14 -- $over
15
16 -- ** @zoom@
17 -- $zoom
18
19 -- * Special types: @Producer Text m (Producer Text m r)@ and @FreeT (Producer Text m) m r@
20 -- $special
21 7
8module Pipes.Text (
22 -- * Producers 9 -- * Producers
23 fromLazy 10 fromLazy
24 11
@@ -141,274 +128,6 @@ import Prelude hiding (
141 words, 128 words,
142 writeFile ) 129 writeFile )
143 130
144{- $intro
145 This package provides @pipes@ utilities for /text streams/ or /character streams/,
146 realized as streams of 'Text' chunks. The individual chunks are uniformly /strict/,
147 and thus you will generally want @Data.Text@ in scope. But the type
148 @Producer Text m r@ ,as we are using it, is a sort of /pipes/ equivalent of the lazy @Text@ type.
149
150 This particular module provides many functions equivalent in one way or another to
151 the pure functions in
152 <https://hackage.haskell.org/package/text-1.1.0.0/docs/Data-Text-Lazy.html Data.Text.Lazy>.
153 They transform, divide, group and fold text streams. Though @Producer Text m r@
154 is the type of \'effectful Text\', the functions in this module are \'pure\'
155 in the sense that they are uniformly monad-independent.
156 Simple /IO/ operations are defined in @Pipes.Text.IO@ -- as lazy IO @Text@
157 operations are in @Data.Text.Lazy.IO@. Inter-operation with @ByteString@
158 is provided in @Pipes.Text.Encoding@, which parallels @Data.Text.Lazy.Encoding@.
159
160 The Text type exported by @Data.Text.Lazy@ is basically that of a lazy list of
161 strict Text: the implementation is arranged so that the individual strict 'Text'
162 chunks are kept to a reasonable size; the user is not aware of the divisions
163 between the connected 'Text' chunks.
164 So also here: the functions in this module are designed to operate on streams that
165 are insensitive to text boundaries. This means that they may freely split
166 text into smaller texts and /discard empty texts/. The objective, though, is
167 that they should /never concatenate texts/ in order to provide strict upper
168 bounds on memory usage.
169
170 For example, to stream only the first three lines of 'stdin' to 'stdout' you
171 might write:
172
173> import Pipes
174> import qualified Pipes.Text as Text
175> import qualified Pipes.Text.IO as Text
176> import Pipes.Group (takes')
177> import Lens.Family
178>
179> main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout
180> where
181> takeLines n = Text.unlines . takes' n . view Text.lines
182
183 The above program will never bring more than one chunk of text (~ 32 KB) into
184 memory, no matter how long the lines are.
185
186-}
187{- $lenses
188 As this example shows, one superficial difference from @Data.Text.Lazy@
189 is that many of the operations, like 'lines', are \'lensified\'; this has a
190 number of advantages (where it is possible); in particular it facilitates their
191 use with 'Parser's of Text (in the general <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html pipes-parse>
192 sense.) The disadvantage, famously, is that the messages you get for type errors can be
193 a little alarming. The remarks that follow in this section are for non-lens adepts.
194
195 Each lens exported here, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the
196 intuitively corresponding function when used with @view@ or @(^.)@. Instead of
197 writing:
198
199 > splitAt 17 producer
200
201 as we would with the Prelude or Text functions, we write
202
203 > view (splitAt 17) producer
204
205 or equivalently
206
207 > producer ^. splitAt 17
208
209 This may seem a little indirect, but note that many equivalents of
210 @Text -> Text@ functions are exported here as 'Pipe's. Here too we recover the intuitively
211 corresponding functions by prefixing them with @(>->)@. Thus something like
212
213> stripLines = Text.unlines . Group.maps (>-> Text.stripStart) . view Text.lines
214
215 would drop the leading white space from each line.
216
217 The lenses in this library are marked as /improper/; this just means that
218 they don't admit all the operations of an ideal lens, but only /getting/ and /focusing/.
219 Just for this reason, though, the magnificent complexities of the lens libraries
220 are a distraction. The lens combinators to keep in mind, the ones that make sense for
221 our lenses, are @view@ \/ @(^.)@), @over@ \/ @(%~)@ , and @zoom@.
222
223 One need only keep in mind that if @l@ is a @Lens' a b@, then:
224
225-}
226{- $view
227 @view l@ is a function @a -> b@ . Thus @view l a@ (also written @a ^. l@ )
228 is the corresponding @b@; as was said above, this function will be exactly the
229 function you think it is, given its name. Thus to uppercase the first n characters
230 of a Producer, leaving the rest the same, we could write:
231
232
233 > upper n p = do p' <- p ^. Text.splitAt n >-> Text.toUpper
234 > p'
235-}
236{- $over
237 @over l@ is a function @(b -> b) -> a -> a@. Thus, given a function that modifies
238 @b@s, the lens lets us modify an @a@ by applying @f :: b -> b@ to
239 the @b@ that we can \"see\" through the lens. So @over l f :: a -> a@
240 (it can also be written @l %~ f@).
241 For any particular @a@, then, @over l f a@ or @(l %~ f) a@ is a revised @a@.
242 So above we might have written things like these:
243
244 > stripLines = Text.lines %~ maps (>-> Text.stripStart)
245 > stripLines = over Text.lines (maps (>-> Text.stripStart))
246 > upper n = Text.splitAt n %~ (>-> Text.toUpper)
247
248-}
249{- $zoom
250 @zoom l@, finally, is a function from a @Parser b m r@
251 to a @Parser a m r@ (or more generally a @StateT (Producer b m x) m r@).
252 Its use is easiest to see with an decoding lens like 'utf8', which
253 \"sees\" a Text producer hidden inside a ByteString producer:
254 @drawChar@ is a Text parser, returning a @Maybe Char@, @zoom utf8 drawChar@ is
255 a /ByteString/ parser, returning a @Maybe Char@. @drawAll@ is a Parser that returns
256 a list of everything produced from a Producer, leaving only the return value; it would
257 usually be unreasonable to use it. But @zoom (splitAt 17) drawAll@
258 returns a list of Text chunks containing the first seventeen Chars, and returns the rest of
259 the Text Producer for further parsing. Suppose that we want, inexplicably, to
260 modify the casing of a Text Producer according to any instruction it might
261 contain at the start. Then we might write something like this:
262
263> obey :: Monad m => Producer Text m b -> Producer Text m b
264> obey p = do (ts, p') <- lift $ runStateT (zoom (Text.splitAt 7) drawAll) p
265> let seven = T.concat ts
266> case T.toUpper seven of
267> "TOUPPER" -> p' >-> Text.toUpper
268> "TOLOWER" -> p' >-> Text.toLower
269> _ -> do yield seven
270> p'
271
272
273> >>> let doc = each ["toU","pperTh","is document.\n"]
274> >>> runEffect $ obey doc >-> Text.stdout
275> THIS DOCUMENT.
276
277 The purpose of exporting lenses is the mental economy achieved with this three-way
278 applicability. That one expression, e.g. @lines@ or @splitAt 17@ can have these
279 three uses is no more surprising than that a pipe can act as a function modifying
280 the output of a producer, namely by using @>->@ to its left: @producer >-> pipe@
281 -- but can /also/ modify the inputs to a consumer by using @>->@ to its right:
282 @pipe >-> consumer@
283
284 The three functions, @view@ \/ @(^.)@, @over@ \/ @(%~)@ and @zoom@ are supplied by
285 both <http://hackage.haskell.org/package/lens lens> and
286 <http://hackage.haskell.org/package/lens-family lens-family> The use of 'zoom' is explained
287 in <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html Pipes.Parse.Tutorial>
288 and to some extent in the @Pipes.Text.Encoding@ module here.
289
290-}
291{- $special
292 These simple 'lines' examples reveal a more important difference from @Data.Text.Lazy@ .
293 This is in the types that are most closely associated with our central text type,
294 @Producer Text m r@. In @Data.Text@ and @Data.Text.Lazy@ we find functions like
295
296> splitAt :: Int -> Text -> (Text, Text)
297> lines :: Text -> [Text]
298> chunksOf :: Int -> Text -> [Text]
299
300 which relate a Text with a pair of Texts or a list of Texts.
301 The corresponding functions here (taking account of \'lensification\') are
302
303> view . splitAt :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text m (Producer Text m r)
304> view lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
305> view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r
306
307 Some of the types may be more readable if you imagine that we have introduced
308 our own type synonyms
309
310> type Text m r = Producer T.Text m r
311> type Texts m r = FreeT (Producer T.Text m) m r
312
313 Then we would think of the types above as
314
315> view . splitAt :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r)
316> view lines :: (Monad m) => Text m r -> Texts m r
317> view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r
318
319 which brings one closer to the types of the similar functions in @Data.Text.Lazy@
320
321 In the type @Producer Text m (Producer Text m r)@ the second
322 element of the \'pair\' of effectful Texts cannot simply be retrieved
323 with something like 'snd'. This is an \'effectful\' pair, and one must work
324 through the effects of the first element to arrive at the second Text stream, even
325 if you are proposing to throw the Text in the first element away.
326 Note that we use Control.Monad.join to fuse the pair back together, since it specializes to
327
328> join :: Monad m => Producer Text m (Producer m r) -> Producer m r
329
330 The return type of 'lines', 'words', 'chunksOf' and the other /splitter/ functions,
331 @FreeT (Producer m Text) m r@ -- our @Texts m r@ -- is the type of (effectful)
332 lists of (effectful) texts. The type @([Text],r)@ might be seen to gather
333 together things of the forms:
334
335> r
336> (Text,r)
337> (Text, (Text, r))
338> (Text, (Text, (Text, r)))
339> (Text, (Text, (Text, (Text, r))))
340> ...
341
342 (We might also have identified the sum of those types with @Free ((,) Text) r@
343 -- or, more absurdly, @FreeT ((,) Text) Identity r@.)
344
345 Similarly, our type @Texts m r@, or @FreeT (Text m) m r@ -- in fact called
346 @FreeT (Producer Text m) m r@ here -- encompasses all the members of the sequence:
347
348> m r
349> Text m r
350> Text m (Text m r)
351> Text m (Text m (Text m r))
352> Text m (Text m (Text m (Text m r)))
353> ...
354
355 We might have used a more specialized type in place of @FreeT (Producer a m) m r@,
356 or indeed of @FreeT (Producer Text m) m r@, but it is clear that the correct
357 result type of 'lines' will be isomorphic to @FreeT (Producer Text m) m r@ .
358
359 One might think that
360
361> lines :: Monad m => Lens' (Producer Text m r) (FreeT (Producer Text m) m r)
362> view . lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
363
364 should really have the type
365
366> lines :: Monad m => Pipe Text Text m r
367
368 as e.g. 'toUpper' does. But this would spoil the control we are
369 attempting to maintain over the size of chunks. It is in fact just
370 as unreasonable to want such a pipe as to want
371
372> Data.Text.Lazy.lines :: Text -> Text
373
374 to 'rechunk' the strict Text chunks inside the lazy Text to respect
375 line boundaries. In fact we have
376
377> Data.Text.Lazy.lines :: Text -> [Text]
378> Prelude.lines :: String -> [String]
379
380 where the elements of the list are themselves lazy Texts or Strings; the use
381 of @FreeT (Producer Text m) m r@ is simply the 'effectful' version of this.
382
383 The @Pipes.Group@ module, which can generally be imported without qualification,
384 provides many functions for working with things of type @FreeT (Producer a m) m r@.
385 In particular it conveniently exports the constructors for @FreeT@ and the associated
386 @FreeF@ type -- a fancy form of @Either@, namely
387
388> data FreeF f a b = Pure a | Free (f b)
389
390 for pattern-matching. Consider the implementation of the 'words' function, or
391 of the part of the lens that takes us to the words; it is compact but exhibits many
392 of the points under discussion, including explicit handling of the @FreeT@ and @FreeF@
393 constuctors. Keep in mind that
394
395> newtype FreeT f m a = FreeT (m (FreeF f a (FreeT f m a)))
396> next :: Monad m => Producer a m r -> m (Either r (a, Producer a m r))
397
398 Thus the @do@ block after the @FreeT@ constructor is in the base monad, e.g. 'IO' or 'Identity';
399 the later subordinate block, opened by the @Free@ constructor, is in the @Producer@ monad:
400
401> words :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
402> words p = FreeT $ do -- With 'next' we will inspect p's first chunk, excluding spaces;
403> x <- next (p >-> dropWhile isSpace) -- note that 'dropWhile isSpace' is a pipe, and is thus *applied* with '>->'.
404> return $ case x of -- We use 'return' and so need something of type 'FreeF (Text m) r (Texts m r)'
405> Left r -> Pure r -- 'Left' means we got no Text chunk, but only the return value; so we are done.
406> Right (txt, p') -> Free $ do -- If we get a chunk and the rest of the producer, p', we enter the 'Producer' monad
407> p'' <- view (break isSpace) -- When we apply 'break isSpace', we get a Producer that returns a Producer;
408> (yield txt >> p') -- so here we yield everything up to the next space, and get the rest back.
409> return (words p'') -- We then carry on with the rest, which is likely to begin with space.
410
411-}
412 131
413-- | Convert a lazy 'TL.Text' into a 'Producer' of strict 'Text's 132-- | Convert a lazy 'TL.Text' into a 'Producer' of strict 'Text's
414fromLazy :: (Monad m) => TL.Text -> Producer' Text m () 133fromLazy :: (Monad m) => TL.Text -> Producer' Text m ()