diff options
-rw-r--r-- | Pipes/Text.hs | 275 |
1 files changed, 169 insertions, 106 deletions
diff --git a/Pipes/Text.hs b/Pipes/Text.hs index 575c987..95fc0e6 100644 --- a/Pipes/Text.hs +++ b/Pipes/Text.hs | |||
@@ -1,113 +1,10 @@ | |||
1 | {-# LANGUAGE RankNTypes, TypeFamilies, BangPatterns, Trustworthy #-} | 1 | {-# LANGUAGE RankNTypes, TypeFamilies, BangPatterns, Trustworthy #-} |
2 | 2 | ||
3 | {-| This /package/ provides @pipes@ utilities for /text streams/, which are | ||
4 | streams of 'Text' chunks. The individual chunks are uniformly /strict/, and thus you | ||
5 | will generally want @Data.Text@ in scope. But the type @Producer Text m r@ is | ||
6 | in some ways the pipes equivalent of the lazy @Text@ type. | ||
7 | |||
8 | This /module/ provides many functions equivalent in one way or another to | ||
9 | the pure functions in | ||
10 | <https://hackage.haskell.org/package/text-1.1.0.0/docs/Data-Text-Lazy.html Data.Text.Lazy>. | ||
11 | They transform, divide, group and fold text streams. Though @Producer Text m r@ | ||
12 | is the type of \'effectful Text\', the functions in this module are \'pure\' | ||
13 | in the sense that they are uniformly monad-independent. | ||
14 | Simple /IO/ operations are defined in @Pipes.Text.IO@ -- as lazy IO @Text@ | ||
15 | operations are in @Data.Text.Lazy.IO@. Inter-operation with @ByteString@ | ||
16 | is provided in @Pipes.Text.Encoding@, which parallels @Data.Text.Lazy.Encoding@. | ||
17 | |||
18 | The Text type exported by @Data.Text.Lazy@ is basically that of a lazy list of | ||
19 | strict Text: the implementation is arranged so that the individual strict 'Text' | ||
20 | chunks are kept to a reasonable size; the user is not aware of the divisions | ||
21 | between the connected 'Text' chunks. | ||
22 | So also here: the functions in this module are designed to operate on streams that | ||
23 | are insensitive to text boundaries. This means that they may freely split | ||
24 | text into smaller texts and /discard empty texts/. The objective, though, is | ||
25 | that they should /never concatenate texts/ in order to provide strict upper | ||
26 | bounds on memory usage. | ||
27 | |||
28 | For example, to stream only the first three lines of 'stdin' to 'stdout' you | ||
29 | might write: | ||
30 | |||
31 | > import Pipes | ||
32 | > import qualified Pipes.Text as Text | ||
33 | > import qualified Pipes.Text.IO as Text | ||
34 | > import Pipes.Group (takes') | ||
35 | > import Lens.Family | ||
36 | > | ||
37 | > main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout | ||
38 | > where | ||
39 | > takeLines n = Text.unlines . takes' n . view Text.lines | ||
40 | |||
41 | |||
42 | The above program will never bring more than one chunk of text (~ 32 KB) into | ||
43 | memory, no matter how long the lines are. | ||
44 | |||
45 | As this example shows, one superficial difference from @Data.Text.Lazy@ | ||
46 | is that many of the operations, like 'lines', | ||
47 | are \'lensified\'; this has a number of advantages (where it is possible), in particular | ||
48 | it facilitates their use with 'Parser's of Text (in the general | ||
49 | <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html pipes-parse> | ||
50 | sense.) | ||
51 | Each such expression, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the | ||
52 | intuitively corresponding function when used with @view@ or @(^.)@. | ||
53 | |||
54 | Note similarly that many equivalents of 'Text -> Text' functions are exported here as 'Pipe's. | ||
55 | They reduce to the intuitively corresponding functions when used with '(>->)'. Thus something like | ||
56 | |||
57 | > stripLines = Text.unlines . Group.maps (>-> Text.stripStart) . view Text.lines | ||
58 | |||
59 | would drop the leading white space from each line. | ||
60 | |||
61 | The lens combinators | ||
62 | you will find indispensible are \'view\' / '(^.)', 'zoom' and probably 'over'. These | ||
63 | are supplied by both <http://hackage.haskell.org/package/lens lens> and | ||
64 | <http://hackage.haskell.org/package/lens-family lens-family> The use of 'zoom' is explained | ||
65 | in <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html Pipes.Parse.Tutorial> | ||
66 | and to some extent in Pipes.Text.Encoding. The use of | ||
67 | 'over' is simple, illustrated by the fact that we can rewrite @stripLines@ above as | ||
68 | |||
69 | > stripLines = over Text.lines $ maps (>-> stripStart) | ||
70 | |||
71 | These simple 'lines' examples reveal a more important difference from @Data.Text.Lazy@ . | ||
72 | This is in the types that are most closely associated with our central text type, | ||
73 | @Producer Text m r@. In @Data.Text@ and @Data.Text.Lazy@ we find functions like | ||
74 | |||
75 | > splitAt :: Int -> Text -> (Text, Text) | ||
76 | > lines :: Text -> [Text] | ||
77 | > chunksOf :: Int -> Text -> [Text] | ||
78 | |||
79 | which relate a Text with a pair of Texts or a list of Texts. | ||
80 | The corresponding functions here (taking account of \'lensification\') are | ||
81 | |||
82 | > view . splitAt :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text m (Producer Text m r) | ||
83 | > view lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r | ||
84 | > view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r | ||
85 | |||
86 | Some of the types may be more readable if you imagine that we have introduced | ||
87 | our own type synonyms | ||
88 | |||
89 | > type Text m r = Producer T.Text m r | ||
90 | > type Texts m r = FreeT (Producer T.Text m) m r | ||
91 | |||
92 | Then we would think of the types above as | ||
93 | |||
94 | > view . splitAt :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r) | ||
95 | > view lines :: (Monad m) => Text m r -> Texts m r | ||
96 | > view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r | ||
97 | |||
98 | which brings one closer to the types of the similar functions in @Data.Text.Lazy@ | ||
99 | |||
100 | In the type @Producer Text m (Producer Text m r)@ the second | ||
101 | element of the \'pair\' of \'effectful Texts\' cannot simply be retrieved | ||
102 | with something like 'snd'. This is an \'effectful\' pair, and one must work | ||
103 | through the effects of the first element to arrive at the second Text stream. | ||
104 | Note that we use Control.Monad.join to fuse the pair back together, since it specializes to | ||
105 | |||
106 | > join :: Producer Text m (Producer m r) -> Producer m r | ||
107 | |||
108 | -} | ||
109 | 3 | ||
110 | module Pipes.Text ( | 4 | module Pipes.Text ( |
5 | -- * Introduction | ||
6 | -- $intro | ||
7 | |||
111 | -- * Producers | 8 | -- * Producers |
112 | fromLazy | 9 | fromLazy |
113 | 10 | ||
@@ -236,6 +133,172 @@ import Prelude hiding ( | |||
236 | words, | 133 | words, |
237 | writeFile ) | 134 | writeFile ) |
238 | 135 | ||
136 | {- $intro | ||
137 | |||
138 | * /Effectful Text/ | ||
139 | |||
140 | This package provides @pipes@ utilities for /text streams/, understood as | ||
141 | streams of 'Text' chunks. The individual chunks are uniformly /strict/, and thus you | ||
142 | will generally want @Data.Text@ in scope. But the type @Producer Text m r@ as we | ||
143 | are using it is a sort of pipes equivalent of the lazy @Text@ type. | ||
144 | |||
145 | This particular module provides many functions equivalent in one way or another to | ||
146 | the pure functions in | ||
147 | <https://hackage.haskell.org/package/text-1.1.0.0/docs/Data-Text-Lazy.html Data.Text.Lazy>. | ||
148 | They transform, divide, group and fold text streams. Though @Producer Text m r@ | ||
149 | is the type of \'effectful Text\', the functions in this module are \'pure\' | ||
150 | in the sense that they are uniformly monad-independent. | ||
151 | Simple /IO/ operations are defined in @Pipes.Text.IO@ -- as lazy IO @Text@ | ||
152 | operations are in @Data.Text.Lazy.IO@. Inter-operation with @ByteString@ | ||
153 | is provided in @Pipes.Text.Encoding@, which parallels @Data.Text.Lazy.Encoding@. | ||
154 | |||
155 | The Text type exported by @Data.Text.Lazy@ is basically that of a lazy list of | ||
156 | strict Text: the implementation is arranged so that the individual strict 'Text' | ||
157 | chunks are kept to a reasonable size; the user is not aware of the divisions | ||
158 | between the connected 'Text' chunks. | ||
159 | So also here: the functions in this module are designed to operate on streams that | ||
160 | are insensitive to text boundaries. This means that they may freely split | ||
161 | text into smaller texts and /discard empty texts/. The objective, though, is | ||
162 | that they should /never concatenate texts/ in order to provide strict upper | ||
163 | bounds on memory usage. | ||
164 | |||
165 | For example, to stream only the first three lines of 'stdin' to 'stdout' you | ||
166 | might write: | ||
167 | |||
168 | > import Pipes | ||
169 | > import qualified Pipes.Text as Text | ||
170 | > import qualified Pipes.Text.IO as Text | ||
171 | > import Pipes.Group (takes') | ||
172 | > import Lens.Family | ||
173 | > | ||
174 | > main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout | ||
175 | > where | ||
176 | > takeLines n = Text.unlines . takes' n . view Text.lines | ||
177 | |||
178 | The above program will never bring more than one chunk of text (~ 32 KB) into | ||
179 | memory, no matter how long the lines are. | ||
180 | |||
181 | * /Lenses/ | ||
182 | |||
183 | As this example shows, one superficial difference from @Data.Text.Lazy@ | ||
184 | is that many of the operations, like 'lines', | ||
185 | are \'lensified\'; this has a number of advantages (where it is possible), in particular | ||
186 | it facilitates their use with 'Parser's of Text (in the general | ||
187 | <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html pipes-parse> | ||
188 | sense.) | ||
189 | Each such lens, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the | ||
190 | intuitively corresponding function when used with @view@ or @(^.)@. | ||
191 | |||
192 | Note similarly that many equivalents of 'Text -> Text' functions are exported here as 'Pipe's. | ||
193 | They reduce to the intuitively corresponding functions when used with '(>->)'. Thus something like | ||
194 | |||
195 | > stripLines = Text.unlines . Group.maps (>-> Text.stripStart) . view Text.lines | ||
196 | |||
197 | would drop the leading white space from each line. | ||
198 | |||
199 | The lens combinators | ||
200 | you will find indispensible are @view@ / @(^.)@), @zoom@ and probably @over@. These | ||
201 | are supplied by both <http://hackage.haskell.org/package/lens lens> and | ||
202 | <http://hackage.haskell.org/package/lens-family lens-family> The use of 'zoom' is explained | ||
203 | in <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html Pipes.Parse.Tutorial> | ||
204 | and to some extent in the @Pipes.Text.Encoding@ module here. The use of | ||
205 | @over@ is simple, illustrated by the fact that we can rewrite @stripLines@ above as | ||
206 | |||
207 | > stripLines = over Text.lines $ maps (>-> stripStart) | ||
208 | |||
209 | |||
210 | * Special types: @Producer Text m (Producer Text m r)@ and @FreeT (Producer Text m) m r@ | ||
211 | |||
212 | These simple 'lines' examples reveal a more important difference from @Data.Text.Lazy@ . | ||
213 | This is in the types that are most closely associated with our central text type, | ||
214 | @Producer Text m r@. In @Data.Text@ and @Data.Text.Lazy@ we find functions like | ||
215 | |||
216 | > splitAt :: Int -> Text -> (Text, Text) | ||
217 | > lines :: Text -> [Text] | ||
218 | > chunksOf :: Int -> Text -> [Text] | ||
219 | |||
220 | which relate a Text with a pair of Texts or a list of Texts. | ||
221 | The corresponding functions here (taking account of \'lensification\') are | ||
222 | |||
223 | > view . splitAt :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text m (Producer Text m r) | ||
224 | > view lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r | ||
225 | > view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r | ||
226 | |||
227 | Some of the types may be more readable if you imagine that we have introduced | ||
228 | our own type synonyms | ||
229 | |||
230 | > type Text m r = Producer T.Text m r | ||
231 | > type Texts m r = FreeT (Producer T.Text m) m r | ||
232 | |||
233 | Then we would think of the types above as | ||
234 | |||
235 | > view . splitAt :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r) | ||
236 | > view lines :: (Monad m) => Text m r -> Texts m r | ||
237 | > view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r | ||
238 | |||
239 | which brings one closer to the types of the similar functions in @Data.Text.Lazy@ | ||
240 | |||
241 | In the type @Producer Text m (Producer Text m r)@ the second | ||
242 | element of the \'pair\' of effectful Texts cannot simply be retrieved | ||
243 | with something like 'snd'. This is an \'effectful\' pair, and one must work | ||
244 | through the effects of the first element to arrive at the second Text stream, even | ||
245 | if you are proposing to throw the Text in the first element away. | ||
246 | Note that we use Control.Monad.join to fuse the pair back together, since it specializes to | ||
247 | |||
248 | > join :: Monad m => Producer Text m (Producer m r) -> Producer m r | ||
249 | |||
250 | The return type of 'lines', 'words', 'chunksOf' and the other "splitter" functions, | ||
251 | @FreeT (Producer m Text) m r@ -- our @Texts m r@ -- is the type of (effectful) | ||
252 | lists of (effectful) texts. The type @([Text],r)@ might be seen to gather | ||
253 | together things of the forms: | ||
254 | |||
255 | > r | ||
256 | > (Text,r) | ||
257 | > (Text, (Text, r)) | ||
258 | > (Text, (Text, (Text, r))) | ||
259 | > (Text, (Text, (Text, (Text, r)))) | ||
260 | > ... | ||
261 | |||
262 | We might also have identified the sum of those types with @Free ((,) Text) r@ | ||
263 | -- or, more absurdly, @FreeT ((,) Text) Identity r@. Similarly, @FreeT (Producer Text m) m r@ | ||
264 | encompasses all the members of the sequence: | ||
265 | |||
266 | > m r | ||
267 | > Producer Text m r | ||
268 | > Producer Text m (Producer Text m r) | ||
269 | > Producer Text m (Producer Text m (Producer Text m r)) | ||
270 | > ... | ||
271 | |||
272 | One might think that | ||
273 | |||
274 | > lines :: Monad m => Lens' (Producer Text m r) (FreeT (Producer Text m) m r) | ||
275 | > view . lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r | ||
276 | |||
277 | should really have the type | ||
278 | |||
279 | > lines :: Monad m => Pipe Text Text m r | ||
280 | |||
281 | as e.g. 'toUpper' does. But this would spoil the control we are | ||
282 | attempting to maintain over the size of chunks. It is in fact just | ||
283 | as unreasonable to want such a pipe as to want | ||
284 | |||
285 | > Data.Text.Lazy.lines :: Text -> Text | ||
286 | |||
287 | to 'rechunk' the strict Text chunks inside the lazy Text to respect | ||
288 | line boundaries. In fact we have | ||
289 | |||
290 | > Data.Text.Lazy.lines :: Text -> [Text] | ||
291 | > Prelude.lines :: String -> [String] | ||
292 | |||
293 | where the elements of the list are themselves lazy Texts or Strings; the use | ||
294 | of @FreeT (Producer Text m) m r@ is simply the 'effectful' version of this. | ||
295 | |||
296 | The @Pipes.Group@ module, which can generally be imported without qualification, | ||
297 | provides many functions for working with things of type @FreeT (Producer a m) m r@ | ||
298 | |||
299 | |||
300 | -} | ||
301 | |||
239 | -- | Convert a lazy 'TL.Text' into a 'Producer' of strict 'Text's | 302 | -- | Convert a lazy 'TL.Text' into a 'Producer' of strict 'Text's |
240 | fromLazy :: (Monad m) => TL.Text -> Producer' Text m () | 303 | fromLazy :: (Monad m) => TL.Text -> Producer' Text m () |
241 | fromLazy = foldrChunks (\e a -> yield e >> a) (return ()) | 304 | fromLazy = foldrChunks (\e a -> yield e >> a) (return ()) |