aboutsummaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
authormichaelt <what_is_it_to_do_anything@yahoo.com>2014-02-21 22:02:25 -0500
committermichaelt <what_is_it_to_do_anything@yahoo.com>2014-02-21 22:02:25 -0500
commite4b6dc671a4b6856f21be3d3b7ffbc189ca73bda (patch)
tree1dba6566f7a93f4da160953a243e7914ea5c8b05
parenta4913c420748559aa8bd4618bc97a562e2bd5b8f (diff)
downloadtext-pipes-e4b6dc671a4b6856f21be3d3b7ffbc189ca73bda.tar.gz
text-pipes-e4b6dc671a4b6856f21be3d3b7ffbc189ca73bda.tar.zst
text-pipes-e4b6dc671a4b6856f21be3d3b7ffbc189ca73bda.zip
finished somewhat wordy documentation for Pipes.Text
-rw-r--r--Pipes/Text.hs275
1 files changed, 169 insertions, 106 deletions
diff --git a/Pipes/Text.hs b/Pipes/Text.hs
index 575c987..95fc0e6 100644
--- a/Pipes/Text.hs
+++ b/Pipes/Text.hs
@@ -1,113 +1,10 @@
1{-# LANGUAGE RankNTypes, TypeFamilies, BangPatterns, Trustworthy #-} 1{-# LANGUAGE RankNTypes, TypeFamilies, BangPatterns, Trustworthy #-}
2 2
3{-| This /package/ provides @pipes@ utilities for /text streams/, which are
4 streams of 'Text' chunks. The individual chunks are uniformly /strict/, and thus you
5 will generally want @Data.Text@ in scope. But the type @Producer Text m r@ is
6 in some ways the pipes equivalent of the lazy @Text@ type.
7
8 This /module/ provides many functions equivalent in one way or another to
9 the pure functions in
10 <https://hackage.haskell.org/package/text-1.1.0.0/docs/Data-Text-Lazy.html Data.Text.Lazy>.
11 They transform, divide, group and fold text streams. Though @Producer Text m r@
12 is the type of \'effectful Text\', the functions in this module are \'pure\'
13 in the sense that they are uniformly monad-independent.
14 Simple /IO/ operations are defined in @Pipes.Text.IO@ -- as lazy IO @Text@
15 operations are in @Data.Text.Lazy.IO@. Inter-operation with @ByteString@
16 is provided in @Pipes.Text.Encoding@, which parallels @Data.Text.Lazy.Encoding@.
17
18 The Text type exported by @Data.Text.Lazy@ is basically that of a lazy list of
19 strict Text: the implementation is arranged so that the individual strict 'Text'
20 chunks are kept to a reasonable size; the user is not aware of the divisions
21 between the connected 'Text' chunks.
22 So also here: the functions in this module are designed to operate on streams that
23 are insensitive to text boundaries. This means that they may freely split
24 text into smaller texts and /discard empty texts/. The objective, though, is
25 that they should /never concatenate texts/ in order to provide strict upper
26 bounds on memory usage.
27
28 For example, to stream only the first three lines of 'stdin' to 'stdout' you
29 might write:
30
31> import Pipes
32> import qualified Pipes.Text as Text
33> import qualified Pipes.Text.IO as Text
34> import Pipes.Group (takes')
35> import Lens.Family
36>
37> main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout
38> where
39> takeLines n = Text.unlines . takes' n . view Text.lines
40
41
42 The above program will never bring more than one chunk of text (~ 32 KB) into
43 memory, no matter how long the lines are.
44
45 As this example shows, one superficial difference from @Data.Text.Lazy@
46 is that many of the operations, like 'lines',
47 are \'lensified\'; this has a number of advantages (where it is possible), in particular
48 it facilitates their use with 'Parser's of Text (in the general
49 <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html pipes-parse>
50 sense.)
51 Each such expression, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the
52 intuitively corresponding function when used with @view@ or @(^.)@.
53
54 Note similarly that many equivalents of 'Text -> Text' functions are exported here as 'Pipe's.
55 They reduce to the intuitively corresponding functions when used with '(>->)'. Thus something like
56
57> stripLines = Text.unlines . Group.maps (>-> Text.stripStart) . view Text.lines
58
59 would drop the leading white space from each line.
60
61 The lens combinators
62 you will find indispensible are \'view\' / '(^.)', 'zoom' and probably 'over'. These
63 are supplied by both <http://hackage.haskell.org/package/lens lens> and
64 <http://hackage.haskell.org/package/lens-family lens-family> The use of 'zoom' is explained
65 in <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html Pipes.Parse.Tutorial>
66 and to some extent in Pipes.Text.Encoding. The use of
67 'over' is simple, illustrated by the fact that we can rewrite @stripLines@ above as
68
69> stripLines = over Text.lines $ maps (>-> stripStart)
70
71 These simple 'lines' examples reveal a more important difference from @Data.Text.Lazy@ .
72 This is in the types that are most closely associated with our central text type,
73 @Producer Text m r@. In @Data.Text@ and @Data.Text.Lazy@ we find functions like
74
75> splitAt :: Int -> Text -> (Text, Text)
76> lines :: Text -> [Text]
77> chunksOf :: Int -> Text -> [Text]
78
79 which relate a Text with a pair of Texts or a list of Texts.
80 The corresponding functions here (taking account of \'lensification\') are
81
82> view . splitAt :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text m (Producer Text m r)
83> view lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
84> view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r
85
86 Some of the types may be more readable if you imagine that we have introduced
87 our own type synonyms
88
89> type Text m r = Producer T.Text m r
90> type Texts m r = FreeT (Producer T.Text m) m r
91
92 Then we would think of the types above as
93
94> view . splitAt :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r)
95> view lines :: (Monad m) => Text m r -> Texts m r
96> view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r
97
98 which brings one closer to the types of the similar functions in @Data.Text.Lazy@
99
100 In the type @Producer Text m (Producer Text m r)@ the second
101 element of the \'pair\' of \'effectful Texts\' cannot simply be retrieved
102 with something like 'snd'. This is an \'effectful\' pair, and one must work
103 through the effects of the first element to arrive at the second Text stream.
104 Note that we use Control.Monad.join to fuse the pair back together, since it specializes to
105
106> join :: Producer Text m (Producer m r) -> Producer m r
107
108-}
109 3
110module Pipes.Text ( 4module Pipes.Text (
5 -- * Introduction
6 -- $intro
7
111 -- * Producers 8 -- * Producers
112 fromLazy 9 fromLazy
113 10
@@ -236,6 +133,172 @@ import Prelude hiding (
236 words, 133 words,
237 writeFile ) 134 writeFile )
238 135
136{- $intro
137
138 * /Effectful Text/
139
140 This package provides @pipes@ utilities for /text streams/, understood as
141 streams of 'Text' chunks. The individual chunks are uniformly /strict/, and thus you
142 will generally want @Data.Text@ in scope. But the type @Producer Text m r@ as we
143 are using it is a sort of pipes equivalent of the lazy @Text@ type.
144
145 This particular module provides many functions equivalent in one way or another to
146 the pure functions in
147 <https://hackage.haskell.org/package/text-1.1.0.0/docs/Data-Text-Lazy.html Data.Text.Lazy>.
148 They transform, divide, group and fold text streams. Though @Producer Text m r@
149 is the type of \'effectful Text\', the functions in this module are \'pure\'
150 in the sense that they are uniformly monad-independent.
151 Simple /IO/ operations are defined in @Pipes.Text.IO@ -- as lazy IO @Text@
152 operations are in @Data.Text.Lazy.IO@. Inter-operation with @ByteString@
153 is provided in @Pipes.Text.Encoding@, which parallels @Data.Text.Lazy.Encoding@.
154
155 The Text type exported by @Data.Text.Lazy@ is basically that of a lazy list of
156 strict Text: the implementation is arranged so that the individual strict 'Text'
157 chunks are kept to a reasonable size; the user is not aware of the divisions
158 between the connected 'Text' chunks.
159 So also here: the functions in this module are designed to operate on streams that
160 are insensitive to text boundaries. This means that they may freely split
161 text into smaller texts and /discard empty texts/. The objective, though, is
162 that they should /never concatenate texts/ in order to provide strict upper
163 bounds on memory usage.
164
165 For example, to stream only the first three lines of 'stdin' to 'stdout' you
166 might write:
167
168> import Pipes
169> import qualified Pipes.Text as Text
170> import qualified Pipes.Text.IO as Text
171> import Pipes.Group (takes')
172> import Lens.Family
173>
174> main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout
175> where
176> takeLines n = Text.unlines . takes' n . view Text.lines
177
178 The above program will never bring more than one chunk of text (~ 32 KB) into
179 memory, no matter how long the lines are.
180
181 * /Lenses/
182
183 As this example shows, one superficial difference from @Data.Text.Lazy@
184 is that many of the operations, like 'lines',
185 are \'lensified\'; this has a number of advantages (where it is possible), in particular
186 it facilitates their use with 'Parser's of Text (in the general
187 <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html pipes-parse>
188 sense.)
189 Each such lens, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the
190 intuitively corresponding function when used with @view@ or @(^.)@.
191
192 Note similarly that many equivalents of 'Text -> Text' functions are exported here as 'Pipe's.
193 They reduce to the intuitively corresponding functions when used with '(>->)'. Thus something like
194
195> stripLines = Text.unlines . Group.maps (>-> Text.stripStart) . view Text.lines
196
197 would drop the leading white space from each line.
198
199 The lens combinators
200 you will find indispensible are @view@ / @(^.)@), @zoom@ and probably @over@. These
201 are supplied by both <http://hackage.haskell.org/package/lens lens> and
202 <http://hackage.haskell.org/package/lens-family lens-family> The use of 'zoom' is explained
203 in <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html Pipes.Parse.Tutorial>
204 and to some extent in the @Pipes.Text.Encoding@ module here. The use of
205 @over@ is simple, illustrated by the fact that we can rewrite @stripLines@ above as
206
207> stripLines = over Text.lines $ maps (>-> stripStart)
208
209
210 * Special types: @Producer Text m (Producer Text m r)@ and @FreeT (Producer Text m) m r@
211
212 These simple 'lines' examples reveal a more important difference from @Data.Text.Lazy@ .
213 This is in the types that are most closely associated with our central text type,
214 @Producer Text m r@. In @Data.Text@ and @Data.Text.Lazy@ we find functions like
215
216> splitAt :: Int -> Text -> (Text, Text)
217> lines :: Text -> [Text]
218> chunksOf :: Int -> Text -> [Text]
219
220 which relate a Text with a pair of Texts or a list of Texts.
221 The corresponding functions here (taking account of \'lensification\') are
222
223> view . splitAt :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text m (Producer Text m r)
224> view lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
225> view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r
226
227 Some of the types may be more readable if you imagine that we have introduced
228 our own type synonyms
229
230> type Text m r = Producer T.Text m r
231> type Texts m r = FreeT (Producer T.Text m) m r
232
233 Then we would think of the types above as
234
235> view . splitAt :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r)
236> view lines :: (Monad m) => Text m r -> Texts m r
237> view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r
238
239 which brings one closer to the types of the similar functions in @Data.Text.Lazy@
240
241 In the type @Producer Text m (Producer Text m r)@ the second
242 element of the \'pair\' of effectful Texts cannot simply be retrieved
243 with something like 'snd'. This is an \'effectful\' pair, and one must work
244 through the effects of the first element to arrive at the second Text stream, even
245 if you are proposing to throw the Text in the first element away.
246 Note that we use Control.Monad.join to fuse the pair back together, since it specializes to
247
248> join :: Monad m => Producer Text m (Producer m r) -> Producer m r
249
250 The return type of 'lines', 'words', 'chunksOf' and the other "splitter" functions,
251 @FreeT (Producer m Text) m r@ -- our @Texts m r@ -- is the type of (effectful)
252 lists of (effectful) texts. The type @([Text],r)@ might be seen to gather
253 together things of the forms:
254
255> r
256> (Text,r)
257> (Text, (Text, r))
258> (Text, (Text, (Text, r)))
259> (Text, (Text, (Text, (Text, r))))
260> ...
261
262 We might also have identified the sum of those types with @Free ((,) Text) r@
263 -- or, more absurdly, @FreeT ((,) Text) Identity r@. Similarly, @FreeT (Producer Text m) m r@
264 encompasses all the members of the sequence:
265
266> m r
267> Producer Text m r
268> Producer Text m (Producer Text m r)
269> Producer Text m (Producer Text m (Producer Text m r))
270> ...
271
272 One might think that
273
274> lines :: Monad m => Lens' (Producer Text m r) (FreeT (Producer Text m) m r)
275> view . lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
276
277 should really have the type
278
279> lines :: Monad m => Pipe Text Text m r
280
281 as e.g. 'toUpper' does. But this would spoil the control we are
282 attempting to maintain over the size of chunks. It is in fact just
283 as unreasonable to want such a pipe as to want
284
285> Data.Text.Lazy.lines :: Text -> Text
286
287 to 'rechunk' the strict Text chunks inside the lazy Text to respect
288 line boundaries. In fact we have
289
290> Data.Text.Lazy.lines :: Text -> [Text]
291> Prelude.lines :: String -> [String]
292
293 where the elements of the list are themselves lazy Texts or Strings; the use
294 of @FreeT (Producer Text m) m r@ is simply the 'effectful' version of this.
295
296 The @Pipes.Group@ module, which can generally be imported without qualification,
297 provides many functions for working with things of type @FreeT (Producer a m) m r@
298
299
300 -}
301
239-- | Convert a lazy 'TL.Text' into a 'Producer' of strict 'Text's 302-- | Convert a lazy 'TL.Text' into a 'Producer' of strict 'Text's
240fromLazy :: (Monad m) => TL.Text -> Producer' Text m () 303fromLazy :: (Monad m) => TL.Text -> Producer' Text m ()
241fromLazy = foldrChunks (\e a -> yield e >> a) (return ()) 304fromLazy = foldrChunks (\e a -> yield e >> a) (return ())