diff options
-rw-r--r-- | Pipes/Text.hs | 91 |
1 files changed, 53 insertions, 38 deletions
diff --git a/Pipes/Text.hs b/Pipes/Text.hs index 2f69806..b90948f 100644 --- a/Pipes/Text.hs +++ b/Pipes/Text.hs | |||
@@ -1,26 +1,27 @@ | |||
1 | {-# LANGUAGE RankNTypes, TypeFamilies, BangPatterns, Trustworthy #-} | 1 | {-# LANGUAGE RankNTypes, TypeFamilies, BangPatterns, Trustworthy #-} |
2 | 2 | ||
3 | {-| This package provides @pipes@ utilities for \'text streams\', which are | 3 | {-| This /package/ provides @pipes@ utilities for /text streams/, which are |
4 | streams of 'Text' chunks. The individual chunks are uniformly @strict@, and thus you | 4 | streams of 'Text' chunks. The individual chunks are uniformly /strict/, and thus you |
5 | will generally want @Data.Text@ in scope. But the type @Producer Text m r@ is | 5 | will generally want @Data.Text@ in scope. But the type @Producer Text m r@ is |
6 | in some ways the pipes equivalent of the lazy @Text@ type. | 6 | in some ways the pipes equivalent of the lazy @Text@ type. |
7 | 7 | ||
8 | This module provides many functions equivalent in one way or another to | 8 | This /module/ provides many functions equivalent in one way or another to |
9 | the 'pure' functions in | 9 | the pure functions in |
10 | <https://hackage.haskell.org/package/text-1.1.0.0/docs/Data-Text-Lazy.html Data.Text.Lazy>. | 10 | <https://hackage.haskell.org/package/text-1.1.0.0/docs/Data-Text-Lazy.html Data.Text.Lazy>. |
11 | They transform, divide, group and fold text streams. Though @Producer Text m r@ | 11 | They transform, divide, group and fold text streams. Though @Producer Text m r@ |
12 | is the type of \'effectful Text\', the functions in this module are \'pure\' | 12 | is the type of \'effectful Text\', the functions in this module are \'pure\' |
13 | in the sense that they are uniformly monad-independent. | 13 | in the sense that they are uniformly monad-independent. |
14 | Simple IO operations are defined in @Pipes.Text.IO@ -- as lazy IO @Text@ | 14 | Simple /IO/ operations are defined in @Pipes.Text.IO@ -- as lazy IO @Text@ |
15 | operations are in @Data.Text.Lazy.IO@. Interoperation with @ByteString@ | 15 | operations are in @Data.Text.Lazy.IO@. Inter-operation with @ByteString@ |
16 | is provided in @Pipes.Text.Encoding@, which parallels @Data.Text.Lazy.Encoding@. | 16 | is provided in @Pipes.Text.Encoding@, which parallels @Data.Text.Lazy.Encoding@. |
17 | 17 | ||
18 | The Text type exported by @Data.Text.Lazy@ is basically '[Text]'. The implementation | 18 | The Text type exported by @Data.Text.Lazy@ is basically that of a lazy list of |
19 | is arranged so that the individual strict 'Text' chunks are kept to a reasonable size; | 19 | strict Text: the implementation is arranged so that the individual strict 'Text' |
20 | the user is not aware of the divisions between the connected 'Text' chunks. | 20 | chunks are kept to a reasonable size; the user is not aware of the divisions |
21 | between the connected 'Text' chunks. | ||
21 | So also here: the functions in this module are designed to operate on streams that | 22 | So also here: the functions in this module are designed to operate on streams that |
22 | are insensitive to text boundaries. This means that they may freely split | 23 | are insensitive to text boundaries. This means that they may freely split |
23 | text into smaller texts and /discard empty texts/. However, the objective is | 24 | text into smaller texts and /discard empty texts/. The objective, though, is |
24 | that they should /never concatenate texts/ in order to provide strict upper | 25 | that they should /never concatenate texts/ in order to provide strict upper |
25 | bounds on memory usage. | 26 | bounds on memory usage. |
26 | 27 | ||
@@ -30,66 +31,80 @@ | |||
30 | > import Pipes | 31 | > import Pipes |
31 | > import qualified Pipes.Text as Text | 32 | > import qualified Pipes.Text as Text |
32 | > import qualified Pipes.Text.IO as Text | 33 | > import qualified Pipes.Text.IO as Text |
33 | > import Pipes.Group | 34 | > import Pipes.Group (takes') |
34 | > import Lens.Family | 35 | > import Lens.Family |
35 | > | 36 | > |
36 | > main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout | 37 | > main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout |
37 | > where | 38 | > where |
38 | > takeLines n = Text.unlines . takes' n . view Text.lines | 39 | > takeLines n = Text.unlines . takes' n . view Text.lines |
39 | > -- or equivalently: | 40 | |
40 | > -- takeLines n = over Text.lines (takes' n) | ||
41 | 41 | ||
42 | The above program will never bring more than one chunk of text (~ 32 KB) into | 42 | The above program will never bring more than one chunk of text (~ 32 KB) into |
43 | memory, no matter how long the lines are. | 43 | memory, no matter how long the lines are. |
44 | 44 | ||
45 | As this example shows, one superficial difference from @Data.Text.Lazy@ | 45 | As this example shows, one superficial difference from @Data.Text.Lazy@ |
46 | is that many of the operations, like 'lines', | 46 | is that many of the operations, like 'lines', |
47 | are \'lensified\'; this has a number of advantages where it is possible, in particular | 47 | are \'lensified\'; this has a number of advantages (where it is possible), in particular |
48 | it facilitates their use with 'Parser's of Text (in the general | 48 | it facilitates their use with 'Parser's of Text (in the general |
49 | <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html pipes-parse> | 49 | <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html pipes-parse> |
50 | sense.) | 50 | sense.) |
51 | Each such expression, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the | 51 | Each such expression, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the |
52 | intuitively corresponding function when used with @view@ or @(^.)@. The lens combinators | 52 | intuitively corresponding function when used with @view@ or @(^.)@. |
53 | you will find indispensible are \'view\'/ '(^.)', 'zoom' and probably 'over', which | 53 | |
54 | Note similarly that many equivalents of 'Text -> Text' functions are exported here as 'Pipe's. | ||
55 | They reduce to the intuitively corresponding functions when used with '(>->)'. Thus something like | ||
56 | |||
57 | > stripLines = Text.unlines . Group.maps (>-> Text.stripStart) . view Text.lines | ||
58 | |||
59 | would drop the leading white space from each line. | ||
60 | |||
61 | The lens combinators | ||
62 | you will find indispensible are \'view\' / '(^.)', 'zoom' and probably 'over'. These | ||
54 | are supplied by both <http://hackage.haskell.org/package/lens lens> and | 63 | are supplied by both <http://hackage.haskell.org/package/lens lens> and |
55 | <http://hackage.haskell.org/package/lens-family lens-family> | 64 | <http://hackage.haskell.org/package/lens-family lens-family> The use of 'zoom' is explained |
65 | in <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html Pipes.Parse.Tutorial> | ||
66 | and to some extent in Pipes.Text.Encoding. The use of | ||
67 | 'over' is simple, illustrated by the fact that we can rewrite @stripLines@ above as | ||
68 | |||
69 | > stripLines = over Text.lines $ maps (>-> stripStart) | ||
56 | 70 | ||
57 | A more important difference the example reveals is in the types closely associated with | 71 | These simple 'lines' examples reveal a more important difference from @Data.Text.Lazy@ . |
58 | the central type, @Producer Text m r@. In @Data.Text@ and @Data.Text.Lazy@ | 72 | This is in the types that are most closely associated with our central text type, |
59 | we find functions like | 73 | @Producer Text m r@. In @Data.Text@ and @Data.Text.Lazy@ we find functions like |
60 | 74 | ||
61 | > splitAt :: Int -> Text -> (Text, Text) | 75 | > splitAt :: Int -> Text -> (Text, Text) |
62 | > lines :: Int -> Text -> [Text] | 76 | > lines :: Text -> [Text] |
63 | > chunksOf :: Int -> Text -> [Text] | 77 | > chunksOf :: Int -> Text -> [Text] |
64 | 78 | ||
65 | which relate a Text with a pair or list of Texts. The corresponding functions here (taking | 79 | which relate a Text with a pair of Texts or a list of Texts. |
66 | account of \'lensification\') are | 80 | The corresponding functions here (taking account of \'lensification\') are |
67 | 81 | ||
68 | > view . splitAt :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text.Text m (Producer Text.Text m r) | 82 | > view . splitAt :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text m (Producer Text m r) |
69 | > view lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r | 83 | > view lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r |
70 | > view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r | 84 | > view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r |
71 | 85 | ||
72 | In the type @Producer Text m (Producer Text m r)@ the second | ||
73 | element of the \'pair\' of of \'effectful Texts\' cannot simply be retrieved | ||
74 | with 'snd'. This is an \'effectful\' pair, and one must work through the effects | ||
75 | of the first element to arrive at the second Text stream. Similarly in @FreeT (Producer Text m) m r@, | ||
76 | which corresponds with @[Text]@, on cannot simply drop 10 Producers and take the others; | ||
77 | we can only get to the ones we want to take by working through their predecessors. | ||
78 | |||
79 | Some of the types may be more readable if you imagine that we have introduced | 86 | Some of the types may be more readable if you imagine that we have introduced |
80 | our own type synonyms | 87 | our own type synonyms |
81 | 88 | ||
82 | > type Text m r = Producer T.Text m r | 89 | > type Text m r = Producer T.Text m r |
83 | > type Texts m r = FreeT (Producer T.Text m) m r | 90 | > type Texts m r = FreeT (Producer T.Text m) m r |
84 | 91 | ||
85 | Then we would think of the types above as | 92 | Then we would think of the types above as |
86 | 93 | ||
87 | > view . splitAt :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r) | 94 | > view . splitAt :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r) |
88 | > view lines :: (Monad m) => Text m r -> Texts m r | 95 | > view lines :: (Monad m) => Text m r -> Texts m r |
89 | > view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r | 96 | > view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r |
90 | 97 | ||
91 | which brings one closer to the types of the similar functions in @Data.Text.Lazy@ | 98 | which brings one closer to the types of the similar functions in @Data.Text.Lazy@ |
92 | 99 | ||
100 | In the type @Producer Text m (Producer Text m r)@ the second | ||
101 | element of the \'pair\' of \'effectful Texts\' cannot simply be retrieved | ||
102 | with something like 'snd'. This is an \'effectful\' pair, and one must work | ||
103 | through the effects of the first element to arrive at the second Text stream. | ||
104 | Note that we use Control.Monad.join to fuse the pair back together, since it specializes to | ||
105 | |||
106 | > join :: Producer Text m (Producer m r) -> Producer m r | ||
107 | |||
93 | -} | 108 | -} |
94 | 109 | ||
95 | module Pipes.Text ( | 110 | module Pipes.Text ( |
@@ -294,7 +309,7 @@ toLower = P.map T.toLower | |||
294 | #-} | 309 | #-} |
295 | 310 | ||
296 | -- | uppercase incoming 'Text' | 311 | -- | uppercase incoming 'Text' |
297 | toUpper :: Monad m => Pipe Text Text m () | 312 | toUpper :: Monad m => Pipe Text Text m r |
298 | toUpper = P.map T.toUpper | 313 | toUpper = P.map T.toUpper |
299 | {-# INLINEABLE toUpper #-} | 314 | {-# INLINEABLE toUpper #-} |
300 | 315 | ||