aboutsummaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
-rw-r--r--Pipes/Text.hs291
-rw-r--r--Pipes/Text/Tutorial.hs310
-rw-r--r--examples/attoparser.hs21
-rw-r--r--examples/decode.hs30
-rw-r--r--examples/lines_url.hs37
-rw-r--r--examples/zoom.hs152
-rw-r--r--pipes-text.cabal8
7 files changed, 559 insertions, 290 deletions
diff --git a/Pipes/Text.hs b/Pipes/Text.hs
index 45b9299..7722f7f 100644
--- a/Pipes/Text.hs
+++ b/Pipes/Text.hs
@@ -1,24 +1,11 @@
1{-# LANGUAGE RankNTypes, TypeFamilies, BangPatterns, Trustworthy #-} 1{-# LANGUAGE RankNTypes, TypeFamilies, BangPatterns, Trustworthy #-}
2 2
3module Pipes.Text ( 3{-| The module @Pipes.Text@ closely follows @Pipes.ByteString@ from
4 -- * Effectful Text 4 the @pipes-bytestring@ package. A draft tutorial can be found in
5 -- $intro 5 @Pipes.Text.Tutorial@.
6 6-}
7 -- * Lenses
8 -- $lenses
9
10 -- ** @view@ \/ @(^.)@
11 -- $view
12
13 -- ** @over@ \/ @(%~)@
14 -- $over
15
16 -- ** @zoom@
17 -- $zoom
18
19 -- * Special types: @Producer Text m (Producer Text m r)@ and @FreeT (Producer Text m) m r@
20 -- $special
21 7
8module Pipes.Text (
22 -- * Producers 9 -- * Producers
23 fromLazy 10 fromLazy
24 11
@@ -141,274 +128,6 @@ import Prelude hiding (
141 words, 128 words,
142 writeFile ) 129 writeFile )
143 130
144{- $intro
145 This package provides @pipes@ utilities for /text streams/ or /character streams/,
146 realized as streams of 'Text' chunks. The individual chunks are uniformly /strict/,
147 and thus you will generally want @Data.Text@ in scope. But the type
148 @Producer Text m r@ ,as we are using it, is a sort of /pipes/ equivalent of the lazy @Text@ type.
149
150 This particular module provides many functions equivalent in one way or another to
151 the pure functions in
152 <https://hackage.haskell.org/package/text-1.1.0.0/docs/Data-Text-Lazy.html Data.Text.Lazy>.
153 They transform, divide, group and fold text streams. Though @Producer Text m r@
154 is the type of \'effectful Text\', the functions in this module are \'pure\'
155 in the sense that they are uniformly monad-independent.
156 Simple /IO/ operations are defined in @Pipes.Text.IO@ -- as lazy IO @Text@
157 operations are in @Data.Text.Lazy.IO@. Inter-operation with @ByteString@
158 is provided in @Pipes.Text.Encoding@, which parallels @Data.Text.Lazy.Encoding@.
159
160 The Text type exported by @Data.Text.Lazy@ is basically that of a lazy list of
161 strict Text: the implementation is arranged so that the individual strict 'Text'
162 chunks are kept to a reasonable size; the user is not aware of the divisions
163 between the connected 'Text' chunks.
164 So also here: the functions in this module are designed to operate on streams that
165 are insensitive to text boundaries. This means that they may freely split
166 text into smaller texts and /discard empty texts/. The objective, though, is
167 that they should /never concatenate texts/ in order to provide strict upper
168 bounds on memory usage.
169
170 For example, to stream only the first three lines of 'stdin' to 'stdout' you
171 might write:
172
173> import Pipes
174> import qualified Pipes.Text as Text
175> import qualified Pipes.Text.IO as Text
176> import Pipes.Group (takes')
177> import Lens.Family
178>
179> main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout
180> where
181> takeLines n = Text.unlines . takes' n . view Text.lines
182
183 The above program will never bring more than one chunk of text (~ 32 KB) into
184 memory, no matter how long the lines are.
185
186-}
187{- $lenses
188 As this example shows, one superficial difference from @Data.Text.Lazy@
189 is that many of the operations, like 'lines', are \'lensified\'; this has a
190 number of advantages (where it is possible); in particular it facilitates their
191 use with 'Parser's of Text (in the general <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html pipes-parse>
192 sense.) The disadvantage, famously, is that the messages you get for type errors can be
193 a little alarming. The remarks that follow in this section are for non-lens adepts.
194
195 Each lens exported here, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the
196 intuitively corresponding function when used with @view@ or @(^.)@. Instead of
197 writing:
198
199 > splitAt 17 producer
200
201 as we would with the Prelude or Text functions, we write
202
203 > view (splitAt 17) producer
204
205 or equivalently
206
207 > producer ^. splitAt 17
208
209 This may seem a little indirect, but note that many equivalents of
210 @Text -> Text@ functions are exported here as 'Pipe's. Here too we recover the intuitively
211 corresponding functions by prefixing them with @(>->)@. Thus something like
212
213> stripLines = Text.unlines . Group.maps (>-> Text.stripStart) . view Text.lines
214
215 would drop the leading white space from each line.
216
217 The lenses in this library are marked as /improper/; this just means that
218 they don't admit all the operations of an ideal lens, but only /getting/ and /focusing/.
219 Just for this reason, though, the magnificent complexities of the lens libraries
220 are a distraction. The lens combinators to keep in mind, the ones that make sense for
221 our lenses, are @view@ \/ @(^.)@), @over@ \/ @(%~)@ , and @zoom@.
222
223 One need only keep in mind that if @l@ is a @Lens' a b@, then:
224
225-}
226{- $view
227 @view l@ is a function @a -> b@ . Thus @view l a@ (also written @a ^. l@ )
228 is the corresponding @b@; as was said above, this function will be exactly the
229 function you think it is, given its name. Thus to uppercase the first n characters
230 of a Producer, leaving the rest the same, we could write:
231
232
233 > upper n p = do p' <- p ^. Text.splitAt n >-> Text.toUpper
234 > p'
235-}
236{- $over
237 @over l@ is a function @(b -> b) -> a -> a@. Thus, given a function that modifies
238 @b@s, the lens lets us modify an @a@ by applying @f :: b -> b@ to
239 the @b@ that we can \"see\" through the lens. So @over l f :: a -> a@
240 (it can also be written @l %~ f@).
241 For any particular @a@, then, @over l f a@ or @(l %~ f) a@ is a revised @a@.
242 So above we might have written things like these:
243
244 > stripLines = Text.lines %~ maps (>-> Text.stripStart)
245 > stripLines = over Text.lines (maps (>-> Text.stripStart))
246 > upper n = Text.splitAt n %~ (>-> Text.toUpper)
247
248-}
249{- $zoom
250 @zoom l@, finally, is a function from a @Parser b m r@
251 to a @Parser a m r@ (or more generally a @StateT (Producer b m x) m r@).
252 Its use is easiest to see with an decoding lens like 'utf8', which
253 \"sees\" a Text producer hidden inside a ByteString producer:
254 @drawChar@ is a Text parser, returning a @Maybe Char@, @zoom utf8 drawChar@ is
255 a /ByteString/ parser, returning a @Maybe Char@. @drawAll@ is a Parser that returns
256 a list of everything produced from a Producer, leaving only the return value; it would
257 usually be unreasonable to use it. But @zoom (splitAt 17) drawAll@
258 returns a list of Text chunks containing the first seventeen Chars, and returns the rest of
259 the Text Producer for further parsing. Suppose that we want, inexplicably, to
260 modify the casing of a Text Producer according to any instruction it might
261 contain at the start. Then we might write something like this:
262
263> obey :: Monad m => Producer Text m b -> Producer Text m b
264> obey p = do (ts, p') <- lift $ runStateT (zoom (Text.splitAt 7) drawAll) p
265> let seven = T.concat ts
266> case T.toUpper seven of
267> "TOUPPER" -> p' >-> Text.toUpper
268> "TOLOWER" -> p' >-> Text.toLower
269> _ -> do yield seven
270> p'
271
272
273> >>> let doc = each ["toU","pperTh","is document.\n"]
274> >>> runEffect $ obey doc >-> Text.stdout
275> THIS DOCUMENT.
276
277 The purpose of exporting lenses is the mental economy achieved with this three-way
278 applicability. That one expression, e.g. @lines@ or @splitAt 17@ can have these
279 three uses is no more surprising than that a pipe can act as a function modifying
280 the output of a producer, namely by using @>->@ to its left: @producer >-> pipe@
281 -- but can /also/ modify the inputs to a consumer by using @>->@ to its right:
282 @pipe >-> consumer@
283
284 The three functions, @view@ \/ @(^.)@, @over@ \/ @(%~)@ and @zoom@ are supplied by
285 both <http://hackage.haskell.org/package/lens lens> and
286 <http://hackage.haskell.org/package/lens-family lens-family> The use of 'zoom' is explained
287 in <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html Pipes.Parse.Tutorial>
288 and to some extent in the @Pipes.Text.Encoding@ module here.
289
290-}
291{- $special
292 These simple 'lines' examples reveal a more important difference from @Data.Text.Lazy@ .
293 This is in the types that are most closely associated with our central text type,
294 @Producer Text m r@. In @Data.Text@ and @Data.Text.Lazy@ we find functions like
295
296> splitAt :: Int -> Text -> (Text, Text)
297> lines :: Text -> [Text]
298> chunksOf :: Int -> Text -> [Text]
299
300 which relate a Text with a pair of Texts or a list of Texts.
301 The corresponding functions here (taking account of \'lensification\') are
302
303> view . splitAt :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text m (Producer Text m r)
304> view lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
305> view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r
306
307 Some of the types may be more readable if you imagine that we have introduced
308 our own type synonyms
309
310> type Text m r = Producer T.Text m r
311> type Texts m r = FreeT (Producer T.Text m) m r
312
313 Then we would think of the types above as
314
315> view . splitAt :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r)
316> view lines :: (Monad m) => Text m r -> Texts m r
317> view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r
318
319 which brings one closer to the types of the similar functions in @Data.Text.Lazy@
320
321 In the type @Producer Text m (Producer Text m r)@ the second
322 element of the \'pair\' of effectful Texts cannot simply be retrieved
323 with something like 'snd'. This is an \'effectful\' pair, and one must work
324 through the effects of the first element to arrive at the second Text stream, even
325 if you are proposing to throw the Text in the first element away.
326 Note that we use Control.Monad.join to fuse the pair back together, since it specializes to
327
328> join :: Monad m => Producer Text m (Producer m r) -> Producer m r
329
330 The return type of 'lines', 'words', 'chunksOf' and the other /splitter/ functions,
331 @FreeT (Producer m Text) m r@ -- our @Texts m r@ -- is the type of (effectful)
332 lists of (effectful) texts. The type @([Text],r)@ might be seen to gather
333 together things of the forms:
334
335> r
336> (Text,r)
337> (Text, (Text, r))
338> (Text, (Text, (Text, r)))
339> (Text, (Text, (Text, (Text, r))))
340> ...
341
342 (We might also have identified the sum of those types with @Free ((,) Text) r@
343 -- or, more absurdly, @FreeT ((,) Text) Identity r@.)
344
345 Similarly, our type @Texts m r@, or @FreeT (Text m) m r@ -- in fact called
346 @FreeT (Producer Text m) m r@ here -- encompasses all the members of the sequence:
347
348> m r
349> Text m r
350> Text m (Text m r)
351> Text m (Text m (Text m r))
352> Text m (Text m (Text m (Text m r)))
353> ...
354
355 We might have used a more specialized type in place of @FreeT (Producer a m) m r@,
356 or indeed of @FreeT (Producer Text m) m r@, but it is clear that the correct
357 result type of 'lines' will be isomorphic to @FreeT (Producer Text m) m r@ .
358
359 One might think that
360
361> lines :: Monad m => Lens' (Producer Text m r) (FreeT (Producer Text m) m r)
362> view . lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
363
364 should really have the type
365
366> lines :: Monad m => Pipe Text Text m r
367
368 as e.g. 'toUpper' does. But this would spoil the control we are
369 attempting to maintain over the size of chunks. It is in fact just
370 as unreasonable to want such a pipe as to want
371
372> Data.Text.Lazy.lines :: Text -> Text
373
374 to 'rechunk' the strict Text chunks inside the lazy Text to respect
375 line boundaries. In fact we have
376
377> Data.Text.Lazy.lines :: Text -> [Text]
378> Prelude.lines :: String -> [String]
379
380 where the elements of the list are themselves lazy Texts or Strings; the use
381 of @FreeT (Producer Text m) m r@ is simply the 'effectful' version of this.
382
383 The @Pipes.Group@ module, which can generally be imported without qualification,
384 provides many functions for working with things of type @FreeT (Producer a m) m r@.
385 In particular it conveniently exports the constructors for @FreeT@ and the associated
386 @FreeF@ type -- a fancy form of @Either@, namely
387
388> data FreeF f a b = Pure a | Free (f b)
389
390 for pattern-matching. Consider the implementation of the 'words' function, or
391 of the part of the lens that takes us to the words; it is compact but exhibits many
392 of the points under discussion, including explicit handling of the @FreeT@ and @FreeF@
393 constuctors. Keep in mind that
394
395> newtype FreeT f m a = FreeT (m (FreeF f a (FreeT f m a)))
396> next :: Monad m => Producer a m r -> m (Either r (a, Producer a m r))
397
398 Thus the @do@ block after the @FreeT@ constructor is in the base monad, e.g. 'IO' or 'Identity';
399 the later subordinate block, opened by the @Free@ constructor, is in the @Producer@ monad:
400
401> words :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
402> words p = FreeT $ do -- With 'next' we will inspect p's first chunk, excluding spaces;
403> x <- next (p >-> dropWhile isSpace) -- note that 'dropWhile isSpace' is a pipe, and is thus *applied* with '>->'.
404> return $ case x of -- We use 'return' and so need something of type 'FreeF (Text m) r (Texts m r)'
405> Left r -> Pure r -- 'Left' means we got no Text chunk, but only the return value; so we are done.
406> Right (txt, p') -> Free $ do -- If we get a chunk and the rest of the producer, p', we enter the 'Producer' monad
407> p'' <- view (break isSpace) -- When we apply 'break isSpace', we get a Producer that returns a Producer;
408> (yield txt >> p') -- so here we yield everything up to the next space, and get the rest back.
409> return (words p'') -- We then carry on with the rest, which is likely to begin with space.
410
411-}
412 131
413-- | Convert a lazy 'TL.Text' into a 'Producer' of strict 'Text's 132-- | Convert a lazy 'TL.Text' into a 'Producer' of strict 'Text's
414fromLazy :: (Monad m) => TL.Text -> Producer' Text m () 133fromLazy :: (Monad m) => TL.Text -> Producer' Text m ()
diff --git a/Pipes/Text/Tutorial.hs b/Pipes/Text/Tutorial.hs
new file mode 100644
index 0000000..07b8751
--- /dev/null
+++ b/Pipes/Text/Tutorial.hs
@@ -0,0 +1,310 @@
1{-# OPTIONS_GHC -fno-warn-unused-imports #-}
2
3module Pipes.Text.Tutorial (
4 -- * Effectful Text
5 -- $intro
6 -- ** @Pipes.Text@
7 -- $pipestext
8 -- ** @Pipes.Text.IO@
9 -- $pipestextio
10 -- ** @Pipes.Text.Encoding@
11 -- $pipestextencoding
12 -- * Lenses
13 -- $lenses
14
15 -- ** @view@ \/ @(^.)@
16 -- $view
17
18 -- ** @over@ \/ @(%~)@
19 -- $over
20
21 -- ** @zoom@
22 -- $zoom
23
24 -- * Special types: @Producer Text m (Producer Text m r)@ and @FreeT (Producer Text m) m r@
25 -- $special
26 ) where
27
28import Pipes
29import Pipes.Text
30import Pipes.Text.IO
31import Pipes.Text.Encoding
32
33{- $intro
34 This package provides @pipes@ utilities for /character streams/,
35 realized as streams of 'Text' chunks. The individual chunks are uniformly /strict/,
36 and thus the @Text@ type we are using is the one from @Data.Text@, not @Data.Text.Lazy@
37 But the type @Producer Text m r@, as we are using it, is a sort of /pipes/ equivalent of
38 the lazy @Text@ type.
39
40 The main @Pipes.Text@ module provides many functions equivalent
41 in one way or another to the pure functions in
42 <https://hackage.haskell.org/package/text-1.1.0.0/docs/Data-Text-Lazy.html Data.Text.Lazy>
43 (and the corresponding @Prelude@ functions for @String@ s): they transform,
44 divide, group and fold text streams. Though @Producer Text m r@
45 is the type of \'effectful Text\', the functions in @Pipes.Text@ are \'pure\'
46 in the sense that they are uniformly monad-independent.
47 Simple /IO/ operations are defined in @Pipes.Text.IO@ - as lazy IO @Text@
48 operations are in @Data.Text.Lazy.IO@. Similarly, as @Data.Text.Lazy.Encoding@
49 handles inter-operation with @Data.ByteString.Lazy@, @Pipes.Text.Encoding@ provides for
50 interoperation with the \'effectful ByteStrings\' of @Pipes.ByteString@.
51
52 Remember that the @Text@ type exported by @Data.Text.Lazy@ is basically
53 that of a lazy list of strict @Text@: the implementation is arranged so that
54 the individual strict 'Text' chunks are kept to a reasonable size; the user
55 is not aware of the divisions between the connected 'Text' chunks, but uses
56 operations akin to those for strict text.
57 So also here: the functions in this module are designed to operate on character streams that
58 in a way that is independent of the boundaries of the underlying @Text@ chunks.
59 This means that they may freely split text into smaller texts and /discard empty texts/.
60 The objective, though, is that they should not /concatenate texts/ in order to provide strict upper
61 bounds on memory usage.
62
63 For example, to stream only the first three lines of 'stdin' to 'stdout' you
64 might write:
65
66> import Pipes
67> import qualified Pipes.Text as Text
68> import qualified Pipes.Text.IO as Text
69> import Pipes.Group (takes')
70> import Lens.Family (view)
71>
72> main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout
73> where
74> takeLines n = view Text.unlines . takes' n . view Text.lines
75
76 This program will never bring more into memory than what @Text.stdin@ considers
77 one chunk of text (~ 32 KB), even if individual lines are split across many chunks.
78
79-}
80{- $lenses
81 As the use of @view@ in this example shows, one superficial difference from @Data.Text.Lazy@
82 is that many of the operations, like 'lines', are \'lensified\'; this has a
83 number of advantages; in particular it facilitates their use with 'Parser's of Text
84 (in the general <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html pipes-parse>
85 sense.) The remarks that follow in this section are for non-lens adepts.
86
87 Each lens exported here, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the
88 intuitively corresponding function when used with @view@ or @(^.)@. Instead of
89 writing:
90
91 > splitAt 17 producer
92
93 as we would with the Prelude or Text functions, we write
94
95 > view (splitAt 17) producer
96
97 or equivalently
98
99 > producer ^. splitAt 17
100
101 This may seem a little indirect, but note that many equivalents of
102 @Text -> Text@ functions are exported here as 'Pipe's. Here too we recover the intuitively
103 corresponding functions by prefixing them with @(>->)@. Thus something like
104
105> stripLines = view Text.unlines . Group.maps (>-> Text.stripStart) . view Text.lines
106
107 would drop the leading white space from each line.
108
109 The lenses in this library are marked as /improper/; this just means that
110 they don't admit all the operations of an ideal lens, but only /getting/ and /focusing/.
111 Just for this reason, though, the magnificent complexities of the lens libraries
112 are a distraction. The lens combinators to keep in mind, the ones that make sense for
113 our lenses, are @view@ \/ @(^.)@), @over@ \/ @(%~)@ , and @zoom@.
114
115 One need only keep in mind that if @l@ is a @Lens' a b@, then:
116
117-}
118{- $view
119 @view l@ is a function @a -> b@ . Thus @view l a@ (also written @a ^. l@ )
120 is the corresponding @b@; as was said above, this function will typically be
121 the pipes equivalent of the function you think it is, given its name. So for example
122
123 > view (Text.drop)
124 > view (Text.splitAt 300) :: Producer Text m r -> Producer Text (Producer Text m r)
125 > Text.stdin ^. splitAt 300 :: Producer Text IO (Producer Text IO r)
126
127 I.e., it produces the first 300 characters, and returns the rest of the producer.
128 Thus to uppercase the first n characters
129 of a Producer, leaving the rest the same, we could write:
130
131
132 > upper n p = do p' <- p ^. Text.splitAt n >-> Text.toUpper
133 > p'
134-}
135{- $over
136 @over l@ is a function @(b -> b) -> a -> a@. Thus, given a function that modifies
137 @b@s, the lens lets us modify an @a@ by applying @f :: b -> b@ to
138 the @b@ that we can \"see\" through the lens. So @over l f :: a -> a@
139 (it can also be written @l %~ f@).
140 For any particular @a@, then, @over l f a@ or @(l %~ f) a@ is a revised @a@.
141 So above we might have written things like these:
142
143 > stripLines = Text.lines %~ maps (>-> Text.stripStart)
144 > stripLines = over Text.lines (maps (>-> Text.stripStart))
145 > upper n = Text.splitAt n %~ (>-> Text.toUpper)
146
147-}
148{- $zoom
149 @zoom l@, finally, is a function from a @Parser b m r@
150 to a @Parser a m r@ (or more generally a @StateT (Producer b m x) m r@).
151 Its use is easiest to see with an decoding lens like 'utf8', which
152 \"sees\" a Text producer hidden inside a ByteString producer:
153 @drawChar@ is a Text parser, returning a @Maybe Char@, @zoom utf8 drawChar@ is
154 a /ByteString/ parser, returning a @Maybe Char@. @drawAll@ is a Parser that returns
155 a list of everything produced from a Producer, leaving only the return value; it would
156 usually be unreasonable to use it. But @zoom (splitAt 17) drawAll@
157 returns a list of Text chunks containing the first seventeen Chars, and returns the rest of
158 the Text Producer for further parsing. Suppose that we want, inexplicably, to
159 modify the casing of a Text Producer according to any instruction it might
160 contain at the start. Then we might write something like this:
161
162> obey :: Monad m => Producer Text m b -> Producer Text m b
163> obey p = do (ts, p') <- lift $ runStateT (zoom (Text.splitAt 7) drawAll) p
164> let seven = T.concat ts
165> case T.toUpper seven of
166> "TOUPPER" -> p' >-> Text.toUpper
167> "TOLOWER" -> p' >-> Text.toLower
168> _ -> do yield seven
169> p'
170
171
172> >>> let doc = each ["toU","pperTh","is document.\n"]
173> >>> runEffect $ obey doc >-> Text.stdout
174> THIS DOCUMENT.
175
176 The purpose of exporting lenses is the mental economy achieved with this three-way
177 applicability. That one expression, e.g. @lines@ or @splitAt 17@ can have these
178 three uses is no more surprising than that a pipe can act as a function modifying
179 the output of a producer, namely by using @>->@ to its left: @producer >-> pipe@
180 -- but can /also/ modify the inputs to a consumer by using @>->@ to its right:
181 @pipe >-> consumer@
182
183 The three functions, @view@ \/ @(^.)@, @over@ \/ @(%~)@ and @zoom@ are supplied by
184 both <http://hackage.haskell.org/package/lens lens> and
185 <http://hackage.haskell.org/package/lens-family lens-family> The use of 'zoom' is explained
186 in <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html Pipes.Parse.Tutorial>
187 and to some extent in the @Pipes.Text.Encoding@ module here.
188
189-}
190{- $special
191 These simple 'lines' examples reveal a more important difference from @Data.Text.Lazy@ .
192 This is in the types that are most closely associated with our central text type,
193 @Producer Text m r@. In @Data.Text@ and @Data.Text.Lazy@ we find functions like
194
195> splitAt :: Int -> Text -> (Text, Text)
196> lines :: Text -> [Text]
197> chunksOf :: Int -> Text -> [Text]
198
199 which relate a Text with a pair of Texts or a list of Texts.
200 The corresponding functions here (taking account of \'lensification\') are
201
202> view . splitAt :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text m (Producer Text m r)
203> view lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
204> view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r
205
206 Some of the types may be more readable if you imagine that we have introduced
207 our own type synonyms
208
209> type Text m r = Producer T.Text m r
210> type Texts m r = FreeT (Producer T.Text m) m r
211
212 Then we would think of the types above as
213
214> view . splitAt :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r)
215> view lines :: (Monad m) => Text m r -> Texts m r
216> view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r
217
218 which brings one closer to the types of the similar functions in @Data.Text.Lazy@
219
220 In the type @Producer Text m (Producer Text m r)@ the second
221 element of the \'pair\' of effectful Texts cannot simply be retrieved
222 with something like 'snd'. This is an \'effectful\' pair, and one must work
223 through the effects of the first element to arrive at the second Text stream, even
224 if you are proposing to throw the Text in the first element away.
225 Note that we use Control.Monad.join to fuse the pair back together, since it specializes to
226
227> join :: Monad m => Producer Text m (Producer m r) -> Producer m r
228
229 The return type of 'lines', 'words', 'chunksOf' and the other /splitter/ functions,
230 @FreeT (Producer m Text) m r@ -- our @Texts m r@ -- is the type of (effectful)
231 lists of (effectful) texts. The type @([Text],r)@ might be seen to gather
232 together things of the forms:
233
234> r
235> (Text,r)
236> (Text, (Text, r))
237> (Text, (Text, (Text, r)))
238> (Text, (Text, (Text, (Text, r))))
239> ...
240
241 (We might also have identified the sum of those types with @Free ((,) Text) r@
242 -- or, more absurdly, @FreeT ((,) Text) Identity r@.)
243
244 Similarly, our type @Texts m r@, or @FreeT (Text m) m r@ -- in fact called
245 @FreeT (Producer Text m) m r@ here -- encompasses all the members of the sequence:
246
247> m r
248> Text m r
249> Text m (Text m r)
250> Text m (Text m (Text m r))
251> Text m (Text m (Text m (Text m r)))
252> ...
253
254 We might have used a more specialized type in place of @FreeT (Producer a m) m r@,
255 or indeed of @FreeT (Producer Text m) m r@, but it is clear that the correct
256 result type of 'lines' will be isomorphic to @FreeT (Producer Text m) m r@ .
257
258 One might think that
259
260> lines :: Monad m => Lens' (Producer Text m r) (FreeT (Producer Text m) m r)
261> view . lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
262
263 should really have the type
264
265> lines :: Monad m => Pipe Text Text m r
266
267 as e.g. 'toUpper' does. But this would spoil the control we are
268 attempting to maintain over the size of chunks. It is in fact just
269 as unreasonable to want such a pipe as to want
270
271> Data.Text.Lazy.lines :: Text -> Text
272
273 to 'rechunk' the strict Text chunks inside the lazy Text to respect
274 line boundaries. In fact we have
275
276> Data.Text.Lazy.lines :: Text -> [Text]
277> Prelude.lines :: String -> [String]
278
279 where the elements of the list are themselves lazy Texts or Strings; the use
280 of @FreeT (Producer Text m) m r@ is simply the 'effectful' version of this.
281
282 The @Pipes.Group@ module, which can generally be imported without qualification,
283 provides many functions for working with things of type @FreeT (Producer a m) m r@.
284 In particular it conveniently exports the constructors for @FreeT@ and the associated
285 @FreeF@ type -- a fancy form of @Either@, namely
286
287> data FreeF f a b = Pure a | Free (f b)
288
289 for pattern-matching. Consider the implementation of the 'words' function, or
290 of the part of the lens that takes us to the words; it is compact but exhibits many
291 of the points under discussion, including explicit handling of the @FreeT@ and @FreeF@
292 constuctors. Keep in mind that
293
294> newtype FreeT f m a = FreeT (m (FreeF f a (FreeT f m a)))
295> next :: Monad m => Producer a m r -> m (Either r (a, Producer a m r))
296
297 Thus the @do@ block after the @FreeT@ constructor is in the base monad, e.g. 'IO' or 'Identity';
298 the later subordinate block, opened by the @Free@ constructor, is in the @Producer@ monad:
299
300> words :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
301> words p = FreeT $ do -- With 'next' we will inspect p's first chunk, excluding spaces;
302> x <- next (p >-> dropWhile isSpace) -- note that 'dropWhile isSpace' is a pipe, and is thus *applied* with '>->'.
303> return $ case x of -- We use 'return' and so need something of type 'FreeF (Text m) r (Texts m r)'
304> Left r -> Pure r -- 'Left' means we got no Text chunk, but only the return value; so we are done.
305> Right (txt, p') -> Free $ do -- If we get a chunk and the rest of the producer, p', we enter the 'Producer' monad
306> p'' <- view (break isSpace) -- When we apply 'break isSpace', we get a Producer that returns a Producer;
307> (yield txt >> p') -- so here we yield everything up to the next space, and get the rest back.
308> return (words p'') -- We then carry on with the rest, which is likely to begin with space.
309
310-}
diff --git a/examples/attoparser.hs b/examples/attoparser.hs
new file mode 100644
index 0000000..ddf770d
--- /dev/null
+++ b/examples/attoparser.hs
@@ -0,0 +1,21 @@
1import Pipes
2import Pipes.Text.IO (fromHandle)
3import Pipes.Attoparsec (parsed)
4import qualified System.IO as IO
5
6data Test = Test {
7 a :: Int,
8 b :: Int
9 } deriving (Show)
10
11testParser :: Parser Test
12testParser = do
13 a <- decimal
14 space
15 b <- decimal
16 endOfLine
17 return $ Test a b
18
19main = IO.withFile "./testfile" IO.ReadMode $ \handle -> runEffect $
20 for test_parser (lift . print)
21 where (parsed (testParser <* endOfLine) (fromHandle handle)) \ No newline at end of file
diff --git a/examples/decode.hs b/examples/decode.hs
new file mode 100644
index 0000000..8cb44f8
--- /dev/null
+++ b/examples/decode.hs
@@ -0,0 +1,30 @@
1-- http://www.haskellforall.com/2014/02/pipes-parse-30-lens-based-parsing.html
2
3import Data.ByteString (ByteString)
4import Data.Text (Text)
5import Lens.Family.State.Strict (zoom)
6import Pipes
7import Pipes.Parse
8import qualified Pipes.ByteString as ByteString
9import qualified Pipes.Text as Text
10
11-- Retrieve all `Text` chunks up to 10 characters
12parser :: Monad m => Parser ByteString m [Text]
13parser = zoom (Text.decodeUtf8 . Text.splitAt 10) drawAll
14
15main = do
16 (textChunks, leftovers) <- runStateT parser ByteString.stdin
17 print textChunks
18
19 -- Now print the remaining `ByteString` chunks
20 byteChunks <- evalStateT drawAll leftovers
21 print byteChunks
22{-
23$ ./decode
24Hello, 世界!!!<Enter>
25["Hello, \19990\30028!"]
26abcdefg<Enter>
27<Ctrl-D>
28["!!\n","abcdefg\n"]
29
30-} \ No newline at end of file
diff --git a/examples/lines_url.hs b/examples/lines_url.hs
new file mode 100644
index 0000000..b676656
--- /dev/null
+++ b/examples/lines_url.hs
@@ -0,0 +1,37 @@
1{-# LANGUAGE OverloadedStrings #-}
2-- https://gist.github.com/michaelt/88e1fac12876857deefe
3-- following
4-- https://gist.github.com/gelisam/c769d186493221d7ebbe and associated controversy.
5
6module Main where
7
8import Prelude hiding (lines)
9import Lens.Family
10import Pipes
11import Pipes.Group
12import Pipes.HTTP
13import Pipes.Text
14import Pipes.Text.Encoding
15import Pipes.Text.IO (toHandle,stdout)
16import qualified System.IO as IO
17import Data.Functor (void)
18import qualified Data.Text as T
19
20main = do
21 req <- parseUrl "https://gist.github.com/gelisam/c769d186493221d7ebbe"
22 -- "http://www.example.com"
23 -- "http://www.gutenberg.org/files/10/10-h/10-h.htm"
24 withManager tlsManagerSettings $ \m ->
25 withHTTP req m $ \resp -> void $ runEffect $
26 number_lines_of (responseBody resp ^. utf8 . lines) >-> toHandle IO.stdout
27
28number_lines_of :: Monad m => FreeT (Producer Text m) m bad -> Producer Text m bad
29number_lines_of = number_loop (1 :: Int) where
30 number_loop n freeProducers = do
31 freeProducer <- lift $ runFreeT freeProducers
32 case freeProducer of
33 Pure badbytes -> do yield $ T.pack "\n"
34 return badbytes -- these could be inspected ...
35 Free p -> do yield $ T.pack ("\n" ++ show n ++ " ")
36 nextFreeProducers <- p
37 number_loop (n+1) nextFreeProducers
diff --git a/examples/zoom.hs b/examples/zoom.hs
new file mode 100644
index 0000000..3442dc8
--- /dev/null
+++ b/examples/zoom.hs
@@ -0,0 +1,152 @@
1-- this file illustrates several uses of `zoom`
2-- one of them is quadratic in the length of the file
3-- since it has to decode and encode repeatedly,
4-- and is thus no good on long files.
5
6{-# LANGUAGE OverloadedStrings #-}
7{-# LANGUAGE BangPatterns#-}
8{-# LANGUAGE RankNTypes #-}
9import Blaze.ByteString.Builder (Builder, fromByteString, toByteString)
10import Control.Exception (Exception)
11import Control.Monad.Trans.Class (lift)
12import Data.ByteString (ByteString)
13import qualified Data.ByteString as S
14import qualified Data.ByteString.Lazy as L
15import Data.Monoid
16import Data.Text (Text)
17import qualified Data.Text as T
18import qualified Data.Text.Encoding as TEE
19import qualified Data.Text.Lazy as TL
20import qualified Data.Text.Lazy.Encoding as TLE
21
22import Pipes
23import Pipes.Parse
24import qualified Pipes.Prelude as PP
25import qualified Pipes.ByteString as Bytes
26import qualified Pipes.Text as Txt
27import Pipes.Text.Encoding (utf8)
28
29import Control.Lens
30import Control.Lens.Internal.Zoom
31import Control.Monad
32import qualified System.IO as IO
33import Control.Monad.Trans.Maybe
34import Control.Monad.State.Class
35
36main :: IO ()
37main = do -- S.writeFile fp $ contents 10000 -- 10000 cannot be handled fileParser0 and 1
38 -- parse_file fileParser0 -- pathological
39 -- parse_file fileParser1 -- programs
40 parse_file fileParser2 -- good program
41
42 where
43 parse_file parser = IO.withBinaryFile fp IO.ReadMode $ \h ->
44 do p' <- runEffect $ parseWith parser ( Bytes.fromHandle h ) >-> PP.print
45 runEffect $ p' >-> PP.print
46 parseWith parser = loop where
47 loop p = do (m,p') <- lift (runStateT (runMaybeT parser) p)
48 case m of Nothing -> return p'
49 Just file -> do yield file
50 loop p'
51 fp = "encoded.fileformat"
52 contents n = (toByteString . mconcat . replicate n . encodeFiles) input
53 <> S.pack (replicate 10 250)
54
55
56
57fileParser0, fileParser1, fileParser2 :: Monad m => MaybeT (StateT (Producer ByteString m x) m) File
58fileParser0 = do (name, len) <- zoom utf8 parseText
59 contents <- zoom (Bytes.splitAt len) (lift drawAll)
60 return (File name (S.concat contents))
61 where
62 -- this parser aggregates all Text parsing into one preliminary parser
63 -- which is then applied with `zoom utf8`
64 -- we cannot tell in advance how long, e.g. the file name will be
65 parseText :: Monad m => MaybeT (StateT (Producer Text m x) m) (Text, Int)
66 parseText = do nameLength <- parseNumber
67 names <- zoom (Txt.splitAt nameLength) $ (lift drawAll)
68 contentLength <- parseNumber
69 return $! (T.concat names, contentLength)
70
71-- here we disaggregate the little Text parsers but still apply them with `zoom utf8`
72-- this makes no difference
73fileParser1 = do nameLength <- zoom utf8 parseNumber
74 names <- zoom (utf8 . Txt.splitAt nameLength) (lift drawAll)
75 contentLength <- zoom utf8 parseNumber
76 contents <- zoom (Bytes.splitAt contentLength) (lift drawAll)
77 return (File (T.concat names) (S.concat contents))
78
79-- this is the good program; be reflecting on the fact that file names
80-- should not be a 1000 bytes long, and binary files longer than e.g. 10 ^ 10
81-- we can restrict the length of the byte stream to which we apply `zoom utf8`
82fileParser2 = do nameLength <- Bytes.splitAt 3 ~~> utf8 ~~> parseNumber
83 names <- Bytes.splitAt nameLength ~~> utf8 ~~> lift drawAll
84 len <- Bytes.splitAt 10 ~~> utf8 ~~> parseNumber
85 contents <- Bytes.splitAt len ~~> lift drawAll
86 return (File (T.concat names) (S.concat contents))
87
88-- infix lens nonsense
89infixr 1 ~~>
90(~~>) :: Zoom m n s t
91 => ((s -> Zoomed n c s) -> t -> Zoomed n c t)
92 -> m c -> n c
93(~~>) = zoom
94{-# INLINE (~~>) #-}
95
96parseNumber :: Monad m => MaybeT (StateT (Producer Text m x) m) Int
97parseNumber = loop 0 where
98 loop !n = do c <- MaybeT Txt.drawChar
99 case c of ':' -> return n
100 _ -> do guard ('0' <= c && c <= '9')
101 loop $! n * 10 + (fromEnum c - fromEnum '0')
102
103
104
105-- --- Michael S's `File` type and its binary encoding, etc.
106
107
108data File = File
109 { fileName :: !Text
110 , fileContents :: !ByteString
111 }
112 deriving Show
113
114encodeFile :: File -> Builder
115encodeFile (File name contents) =
116 tellLength (S.length bytesname) <>
117 fromByteString bytesname <>
118 tellLength (S.length contents) <>
119 fromByteString contents
120 where
121 tellLength i = fromByteString $ TEE.encodeUtf8 (T.pack (shows i ":"))
122 bytesname = TEE.encodeUtf8 name
123
124encodeFiles :: [File] -> Builder
125encodeFiles = mconcat . map encodeFile
126
127input :: [File]
128input =
129 [ File "utf8.txt" $ TEE.encodeUtf8 "This file is in UTF-8"
130 , File "utf16.txt" $ TEE.encodeUtf16LE "This file is in UTF-16"
131 , File "binary.dat" "we'll pretend to be binary"
132 ]
133
134
135---
136
137-- This desperate scheme actually has some efficacy, if used before `utf8` in a zoom
138-- but not much
139
140chunk :: Monad m => Int -> Lens' (Producer ByteString m r) (Producer ByteString m r)
141chunk n = lens (chunkyN n) (\_ b -> b) where
142
143 chunkyN :: Monad m => Int -> Producer ByteString m r -> Producer ByteString m r
144 chunkyN n = prod_loop where
145
146 prod_loop p = do mbs <- lift $ next p
147 case mbs of Left r -> return r
148 Right (bs, p') -> do bs_loop bs
149 prod_loop p'
150 bs_loop bs = unless (S.null bs) $ do yield fore
151 unless (S.null aft) (bs_loop aft)
152 where (fore, aft) = S.splitAt n bs
diff --git a/pipes-text.cabal b/pipes-text.cabal
index 041f2c7..bcd4911 100644
--- a/pipes-text.cabal
+++ b/pipes-text.cabal
@@ -1,5 +1,5 @@
1name: pipes-text 1name: pipes-text
2version: 0.0.0.12 2version: 0.0.0.14
3synopsis: Text pipes. 3synopsis: Text pipes.
4description: * This package will be in a draft, or testing, phase until version 0.0.1. Please report any installation difficulties, or any wisdom about the api, on the github page or the <https://groups.google.com/forum/#!forum/haskell-pipes pipes list> 4description: * This package will be in a draft, or testing, phase until version 0.0.1. Please report any installation difficulties, or any wisdom about the api, on the github page or the <https://groups.google.com/forum/#!forum/haskell-pipes pipes list>
5 . 5 .
@@ -36,7 +36,7 @@ library
36 exposed-modules: Pipes.Text, Pipes.Text.Encoding 36 exposed-modules: Pipes.Text, Pipes.Text.Encoding
37 build-depends: base >= 4 && < 5 , 37 build-depends: base >= 4 && < 5 ,
38 bytestring >= 0.9.2.1 && < 0.11, 38 bytestring >= 0.9.2.1 && < 0.11,
39 text >= 0.11.2 && < 1.2 , 39 text >= 0.11.2 && < 1.3 ,
40 streaming-commons >= 0.1 && < 0.2 , 40 streaming-commons >= 0.1 && < 0.2 ,
41 pipes >= 4.0 && < 4.2 , 41 pipes >= 4.0 && < 4.2 ,
42 pipes-group >= 1.0.0 && < 1.1 , 42 pipes-group >= 1.0.0 && < 1.1 ,
@@ -50,6 +50,6 @@ library
50 ghc-options: -O2 50 ghc-options: -O2
51 51
52 if !flag(noio) 52 if !flag(noio)
53 exposed-modules: Pipes.Text.IO 53 exposed-modules: Pipes.Text.IO, Pipes.Text.Tutorial
54 build-depends: text >=0.11.3 && < 1.2 54 build-depends: text >=0.11.3 && < 1.3
55 55