X-Git-Url: https://git.immae.eu/?a=blobdiff_plain;f=Pipes%2FText.hs;fp=Pipes%2FText.hs;h=b90948f99636ab0bc8d72ac6321debe3207446c7;hb=c70edb9ddbdbbd323d5da7cc4d0f72a795d4bb0e;hp=2f698060629d02a49358f25618f0c1f31392a2a8;hpb=02f89dfe9b4787fbad5f3740ed1626203c474a2b;p=github%2Ffretlink%2Ftext-pipes.git diff --git a/Pipes/Text.hs b/Pipes/Text.hs index 2f69806..b90948f 100644 --- a/Pipes/Text.hs +++ b/Pipes/Text.hs @@ -1,26 +1,27 @@ {-# LANGUAGE RankNTypes, TypeFamilies, BangPatterns, Trustworthy #-} -{-| This package provides @pipes@ utilities for \'text streams\', which are - streams of 'Text' chunks. The individual chunks are uniformly @strict@, and thus you +{-| This /package/ provides @pipes@ utilities for /text streams/, which are + streams of 'Text' chunks. The individual chunks are uniformly /strict/, and thus you will generally want @Data.Text@ in scope. But the type @Producer Text m r@ is in some ways the pipes equivalent of the lazy @Text@ type. - This module provides many functions equivalent in one way or another to - the 'pure' functions in + This /module/ provides many functions equivalent in one way or another to + the pure functions in . They transform, divide, group and fold text streams. Though @Producer Text m r@ is the type of \'effectful Text\', the functions in this module are \'pure\' in the sense that they are uniformly monad-independent. - Simple IO operations are defined in @Pipes.Text.IO@ -- as lazy IO @Text@ - operations are in @Data.Text.Lazy.IO@. Interoperation with @ByteString@ + Simple /IO/ operations are defined in @Pipes.Text.IO@ -- as lazy IO @Text@ + operations are in @Data.Text.Lazy.IO@. Inter-operation with @ByteString@ is provided in @Pipes.Text.Encoding@, which parallels @Data.Text.Lazy.Encoding@. - The Text type exported by @Data.Text.Lazy@ is basically '[Text]'. The implementation - is arranged so that the individual strict 'Text' chunks are kept to a reasonable size; - the user is not aware of the divisions between the connected 'Text' chunks. + The Text type exported by @Data.Text.Lazy@ is basically that of a lazy list of + strict Text: the implementation is arranged so that the individual strict 'Text' + chunks are kept to a reasonable size; the user is not aware of the divisions + between the connected 'Text' chunks. So also here: the functions in this module are designed to operate on streams that are insensitive to text boundaries. This means that they may freely split - text into smaller texts and /discard empty texts/. However, the objective is + text into smaller texts and /discard empty texts/. The objective, though, is that they should /never concatenate texts/ in order to provide strict upper bounds on memory usage. @@ -30,66 +31,80 @@ > import Pipes > import qualified Pipes.Text as Text > import qualified Pipes.Text.IO as Text -> import Pipes.Group +> import Pipes.Group (takes') > import Lens.Family > > main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout > where > takeLines n = Text.unlines . takes' n . view Text.lines -> -- or equivalently: -> -- takeLines n = over Text.lines (takes' n) + The above program will never bring more than one chunk of text (~ 32 KB) into memory, no matter how long the lines are. As this example shows, one superficial difference from @Data.Text.Lazy@ is that many of the operations, like 'lines', - are \'lensified\'; this has a number of advantages where it is possible, in particular + are \'lensified\'; this has a number of advantages (where it is possible), in particular it facilitates their use with 'Parser's of Text (in the general sense.) Each such expression, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the - intuitively corresponding function when used with @view@ or @(^.)@. The lens combinators - you will find indispensible are \'view\'/ '(^.)', 'zoom' and probably 'over', which + intuitively corresponding function when used with @view@ or @(^.)@. + + Note similarly that many equivalents of 'Text -> Text' functions are exported here as 'Pipe's. + They reduce to the intuitively corresponding functions when used with '(>->)'. Thus something like + +> stripLines = Text.unlines . Group.maps (>-> Text.stripStart) . view Text.lines + + would drop the leading white space from each line. + + The lens combinators + you will find indispensible are \'view\' / '(^.)', 'zoom' and probably 'over'. These are supplied by both and - + The use of 'zoom' is explained + in + and to some extent in Pipes.Text.Encoding. The use of + 'over' is simple, illustrated by the fact that we can rewrite @stripLines@ above as + +> stripLines = over Text.lines $ maps (>-> stripStart) - A more important difference the example reveals is in the types closely associated with - the central type, @Producer Text m r@. In @Data.Text@ and @Data.Text.Lazy@ - we find functions like + These simple 'lines' examples reveal a more important difference from @Data.Text.Lazy@ . + This is in the types that are most closely associated with our central text type, + @Producer Text m r@. In @Data.Text@ and @Data.Text.Lazy@ we find functions like -> splitAt :: Int -> Text -> (Text, Text) -> lines :: Int -> Text -> [Text] +> splitAt :: Int -> Text -> (Text, Text) +> lines :: Text -> [Text] > chunksOf :: Int -> Text -> [Text] - which relate a Text with a pair or list of Texts. The corresponding functions here (taking - account of \'lensification\') are + which relate a Text with a pair of Texts or a list of Texts. + The corresponding functions here (taking account of \'lensification\') are -> view . splitAt :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text.Text m (Producer Text.Text m r) -> view lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r -> view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r +> view . splitAt :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text m (Producer Text m r) +> view lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r +> view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r - In the type @Producer Text m (Producer Text m r)@ the second - element of the \'pair\' of of \'effectful Texts\' cannot simply be retrieved - with 'snd'. This is an \'effectful\' pair, and one must work through the effects - of the first element to arrive at the second Text stream. Similarly in @FreeT (Producer Text m) m r@, - which corresponds with @[Text]@, on cannot simply drop 10 Producers and take the others; - we can only get to the ones we want to take by working through their predecessors. - Some of the types may be more readable if you imagine that we have introduced our own type synonyms -> type Text m r = Producer T.Text m r +> type Text m r = Producer T.Text m r > type Texts m r = FreeT (Producer T.Text m) m r Then we would think of the types above as -> view . splitAt :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r) -> view lines :: (Monad m) => Text m r -> Texts m r +> view . splitAt :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r) +> view lines :: (Monad m) => Text m r -> Texts m r > view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r which brings one closer to the types of the similar functions in @Data.Text.Lazy@ + In the type @Producer Text m (Producer Text m r)@ the second + element of the \'pair\' of \'effectful Texts\' cannot simply be retrieved + with something like 'snd'. This is an \'effectful\' pair, and one must work + through the effects of the first element to arrive at the second Text stream. + Note that we use Control.Monad.join to fuse the pair back together, since it specializes to + +> join :: Producer Text m (Producer m r) -> Producer m r + -} module Pipes.Text ( @@ -294,7 +309,7 @@ toLower = P.map T.toLower #-} -- | uppercase incoming 'Text' -toUpper :: Monad m => Pipe Text Text m () +toUpper :: Monad m => Pipe Text Text m r toUpper = P.map T.toUpper {-# INLINEABLE toUpper #-}