Pipes/Text/Tutorial.hs

   1 {-# OPTIONS_GHC -fno-warn-unused-imports #-}
   2
   3 module Pipes.Text.Tutorial (
   4     -- * Effectful Text
   5     -- $intro
   6     -- ** @Pipes.Text@
   7     -- $pipestext
   8     -- ** @Pipes.Text.IO@
   9     -- $pipestextio
  10     -- ** @Pipes.Text.Encoding@
  11     -- $pipestextencoding
  12     -- * Lenses
  13     -- $lenses
  14
  15     -- ** @view@ \/ @(^.)@
  16     -- $view
  17
  18     -- ** @over@ \/ @(%~)@
  19     -- $over
  20
  21     -- ** @zoom@
  22     -- $zoom
  23
  24     -- * Special types: @Producer Text m (Producer Text m r)@ and @FreeT (Producer Text m) m r@
  25     -- $special
  26     ) where
  27
  28 import Pipes
  29 import Pipes.Text
  30 import Pipes.Text.IO
  31 import Pipes.Text.Encoding
  32
  33 {- $intro
  34     This package provides @pipes@ utilities for /character streams/,
  35     realized as streams of 'Text' chunks. The individual chunks are uniformly /strict/,
  36     and thus the @Text@ type we are using is the one from @Data.Text@, not @Data.Text.Lazy@
  37     But the type @Producer Text m r@, as we are using it, is a sort of /pipes/ equivalent of
  38     the lazy @Text@ type.
  39
  40     The main @Pipes.Text@ module provides many functions equivalent
  41     in one way or another to the pure functions in
  42     <https://hackage.haskell.org/package/text-1.1.0.0/docs/Data-Text-Lazy.html Data.Text.Lazy>
  43     (and the corresponding @Prelude@ functions for @String@ s): they transform,
  44     divide, group and fold text streams. Though @Producer Text m r@
  45     is the type of \'effectful Text\', the functions in @Pipes.Text@ are \'pure\'
  46     in the sense that they are uniformly monad-independent.
  47     Simple /IO/ operations are defined in @Pipes.Text.IO@ - as lazy IO @Text@
  48     operations are in @Data.Text.Lazy.IO@. Similarly, as @Data.Text.Lazy.Encoding@
  49     handles inter-operation with @Data.ByteString.Lazy@, @Pipes.Text.Encoding@ provides for
  50     interoperation with the \'effectful ByteStrings\' of @Pipes.ByteString@.
  51
  52     Remember that the @Text@ type exported by @Data.Text.Lazy@ is basically
  53     that of a lazy list of strict @Text@: the implementation is arranged so that
  54     the individual strict 'Text' chunks are kept to a reasonable size; the user
  55     is not aware of the divisions between the connected 'Text' chunks, but uses
  56     operations akin to those for strict text.
  57     So also here: the functions in this module are designed to operate on character streams that
  58     in a way that is independent of the boundaries of the underlying @Text@ chunks.
  59     This means that they may freely split text into smaller texts and /discard empty texts/.
  60     The objective, though, is that they should not /concatenate texts/ in order to provide strict upper
  61     bounds on memory usage.
  62
  63     For example, to stream only the first three lines of 'stdin' to 'stdout' you
  64     might write:
  65
  66 > import Pipes
  67 > import qualified Pipes.Text as Text
  68 > import qualified Pipes.Text.IO as Text
  69 > import Pipes.Group (takes')
  70 > import Lens.Family (view)
  71 >
  72 > main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout
  73 >   where
  74 >     takeLines n = view Text.unlines . takes' n . view Text.lines
  75
  76      This program will never bring more into memory than what @Text.stdin@ considers
  77      one chunk of text (~ 32 KB), even if individual lines are split across many chunks.
  78
  79 -}
  80 {- $lenses
  81     As the use of @view@ in this example shows, one superficial difference from @Data.Text.Lazy@
  82     is that many of the operations, like 'lines', are \'lensified\'; this has a
  83     number of advantages; in particular it facilitates their use with 'Parser's of Text
  84     (in the general <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html pipes-parse>
  85     sense.) The remarks that follow in this section are for non-lens adepts.
  86
  87     Each lens exported here, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the
  88     intuitively corresponding function when used with @view@ or @(^.)@. Instead of
  89     writing:
  90
  91     > splitAt 17 producer
  92
  93     as we would with the Prelude or Text functions, we write
  94
  95     > view (splitAt 17) producer
  96
  97     or equivalently
  98
  99     > producer ^. splitAt 17
 100
 101     This may seem a little indirect, but note that many equivalents of
 102     @Text -> Text@ functions are exported here as 'Pipe's. Here too we recover the intuitively
 103     corresponding functions by prefixing them with @(>->)@. Thus something like
 104
 105 >  stripLines =  view Text.unlines . Group.maps (>-> Text.stripStart) . view Text.lines
 106
 107     would drop the leading white space from each line.
 108
 109     The lenses in this library are marked as /improper/; this just means that
 110     they don't admit all the operations of an ideal lens, but only /getting/ and /focusing/.
 111     Just for this reason, though, the magnificent complexities of the lens libraries
 112     are a distraction. The lens combinators to keep in mind, the ones that make sense for
 113     our lenses, are @view@ \/ @(^.)@), @over@ \/ @(%~)@ , and @zoom@.
 114
 115     One need only keep in mind that if @l@ is a @Lens' a b@, then:
 116
 117 -}
 118 {- $view
 119     @view l@ is a function @a -> b@ . Thus @view l a@ (also written @a ^. l@ )
 120     is the corresponding @b@; as was said above, this function will typically be
 121     the pipes equivalent of the function you think it is, given its name. So for example
 122
 123     > view (Text.drop)
 124     > view (Text.splitAt 300) :: Producer Text m r -> Producer Text (Producer Text m r)
 125     > Text.stdin ^. splitAt 300 :: Producer Text IO (Producer Text IO r)
 126
 127     I.e., it produces the first 300 characters, and returns the rest of the producer.
 128     Thus to uppercase the first n characters
 129     of a Producer, leaving the rest the same, we could write:
 130
 131
 132     > upper n p = do p' <- p ^. Text.splitAt n >-> Text.toUpper
 133     >                p'
 134 -}
 135 {- $over
 136     @over l@ is a function @(b -> b) -> a -> a@.  Thus, given a function that modifies
 137     @b@s, the lens lets us modify an @a@ by applying @f :: b -> b@ to
 138     the @b@ that we can \"see\" through the lens. So  @over l f :: a -> a@
 139     (it can also be written @l %~ f@).
 140     For any particular @a@, then, @over l f a@ or @(l %~ f) a@ is a revised @a@.
 141     So above we might have written things like these:
 142
 143     > stripLines = Text.lines %~ maps (>-> Text.stripStart)
 144     > stripLines = over Text.lines (maps (>-> Text.stripStart))
 145     > upper n    =  Text.splitAt n %~ (>-> Text.toUpper)
 146
 147 -}
 148 {- $zoom
 149     @zoom l@, finally, is a function from a @Parser b m r@
 150     to a @Parser a m r@ (or more generally a @StateT (Producer b m x) m r@).
 151     Its use is easiest to see with an decoding lens like 'utf8', which
 152     \"sees\" a Text producer hidden inside a ByteString producer:
 153     @drawChar@ is a Text parser, returning a @Maybe Char@, @zoom utf8 drawChar@ is
 154     a /ByteString/ parser, returning a @Maybe Char@. @drawAll@ is a Parser that returns
 155     a list of everything produced from a Producer, leaving only the return value; it would
 156     usually be unreasonable to use it. But @zoom (splitAt 17) drawAll@
 157     returns a list of Text chunks containing the first seventeen Chars, and returns the rest of
 158     the Text Producer for further parsing. Suppose that we want, inexplicably, to
 159     modify the casing of a Text Producer according to any instruction it might
 160     contain at the start. Then we might write something like this:
 161
 162 >     obey :: Monad m => Producer Text m b -> Producer Text m b
 163 >     obey p = do (ts, p') <- lift $ runStateT (zoom (Text.splitAt 7) drawAll) p
 164 >                 let seven = T.concat ts
 165 >                 case T.toUpper seven of
 166 >                    "TOUPPER" -> p' >-> Text.toUpper
 167 >                    "TOLOWER" -> p' >-> Text.toLower
 168 >                    _         -> do yield seven
 169 >                                    p'
 170
 171
 172 > >>> let doc = each ["toU","pperTh","is document.\n"]
 173 > >>> runEffect $ obey doc >-> Text.stdout
 174 > THIS DOCUMENT.
 175
 176     The purpose of exporting lenses is the mental economy achieved with this three-way
 177     applicability. That one expression, e.g. @lines@ or @splitAt 17@ can have these
 178     three uses is no more surprising than that a pipe can act as a function modifying
 179     the output of a producer, namely by using @>->@ to its left: @producer >-> pipe@
 180     -- but can /also/ modify the inputs to a consumer by using @>->@ to its right:
 181     @pipe >-> consumer@
 182
 183     The three functions, @view@ \/ @(^.)@, @over@ \/ @(%~)@ and @zoom@ are supplied by
 184     both <http://hackage.haskell.org/package/lens lens> and
 185     <http://hackage.haskell.org/package/lens-family lens-family> The use of 'zoom' is explained
 186     in <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html Pipes.Parse.Tutorial>
 187     and to some extent in the @Pipes.Text.Encoding@ module here.
 188
 189 -}
 190 {- $special
 191     These simple 'lines' examples reveal a more important difference from @Data.Text.Lazy@ .
 192     This is in the types that are most closely associated with our central text type,
 193     @Producer Text m r@.  In @Data.Text@ and @Data.Text.Lazy@ we find functions like
 194
 195 >   splitAt  :: Int -> Text -> (Text, Text)
 196 >   lines    ::        Text -> [Text]
 197 >   chunksOf :: Int -> Text -> [Text]
 198
 199     which relate a Text with a pair of Texts or a list of Texts.
 200     The corresponding functions here (taking account of \'lensification\') are
 201
 202 >   view . splitAt  :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text m (Producer Text m r)
 203 >   view lines      :: Monad m               =>      Producer Text m r -> FreeT (Producer Text m) m r
 204 >   view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r
 205
 206     Some of the types may be more readable if you imagine that we have introduced
 207     our own type synonyms
 208
 209 >   type Text m r  = Producer T.Text m r
 210 >   type Texts m r = FreeT (Producer T.Text m) m r
 211
 212     Then we would think of the types above as
 213
 214 >   view . splitAt  :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r)
 215 >   view lines      :: (Monad m)             =>      Text m r -> Texts m r
 216 >   view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r
 217
 218     which brings one closer to the types of the similar functions in @Data.Text.Lazy@
 219
 220     In the type @Producer Text m (Producer Text m r)@ the second
 221     element of the \'pair\' of effectful Texts cannot simply be retrieved
 222     with something like 'snd'. This is an \'effectful\' pair, and one must work
 223     through the effects of the first element to arrive at the second Text stream, even
 224     if you are proposing to throw the Text in the first element away.
 225     Note that we use Control.Monad.join to fuse the pair back together, since it specializes to
 226
 227 >    join :: Monad m => Producer Text m (Producer m r) -> Producer m r
 228
 229     The return type of 'lines', 'words', 'chunksOf' and the other /splitter/ functions,
 230     @FreeT (Producer m Text) m r@ -- our @Texts m r@ -- is the type of (effectful)
 231     lists of (effectful) texts. The type @([Text],r)@ might be seen to gather
 232     together things of the forms:
 233
 234 > r
 235 > (Text,r)
 236 > (Text, (Text, r))
 237 > (Text, (Text, (Text, r)))
 238 > (Text, (Text, (Text, (Text, r))))
 239 > ...
 240
 241     (We might also have identified the sum of those types with @Free ((,) Text) r@
 242     -- or, more absurdly, @FreeT ((,) Text) Identity r@.)
 243
 244     Similarly, our type @Texts m r@, or @FreeT (Text m) m r@ -- in fact called
 245     @FreeT (Producer Text m) m r@ here -- encompasses all the members of the sequence:
 246
 247 > m r
 248 > Text m r
 249 > Text m (Text m r)
 250 > Text m (Text m (Text m r))
 251 > Text m (Text m (Text m (Text m r)))
 252 > ...
 253
 254     We might have used a more specialized type in place of @FreeT (Producer a m) m r@,
 255     or indeed of @FreeT (Producer Text m) m r@, but it is clear that the correct
 256     result type of 'lines' will be isomorphic to @FreeT (Producer Text m) m r@ .
 257
 258     One might think that
 259
 260 >   lines :: Monad m => Lens' (Producer Text m r) (FreeT (Producer Text m) m r)
 261 >   view . lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
 262
 263     should really have the type
 264
 265 >   lines :: Monad m => Pipe Text Text m r
 266
 267     as e.g. 'toUpper' does. But this would spoil the control we are
 268     attempting to maintain over the size of chunks. It is in fact just
 269     as unreasonable to want such a pipe as to want
 270
 271 > Data.Text.Lazy.lines :: Text -> Text
 272
 273     to 'rechunk' the strict Text chunks inside the lazy Text to respect
 274     line boundaries. In fact we have
 275
 276 > Data.Text.Lazy.lines :: Text -> [Text]
 277 > Prelude.lines :: String -> [String]
 278
 279     where the elements of the list are themselves lazy Texts or Strings; the use
 280     of @FreeT (Producer Text m) m r@ is simply the 'effectful' version of this.
 281
 282     The @Pipes.Group@ module, which can generally be imported without qualification,
 283     provides many functions for working with things of type @FreeT (Producer a m) m r@.
 284     In particular it conveniently exports the constructors for @FreeT@ and the associated
 285     @FreeF@ type -- a fancy form of @Either@, namely
 286
 287 > data FreeF f a b = Pure a | Free (f b)
 288
 289     for pattern-matching. Consider the implementation of the 'words' function, or
 290     of the part of the lens that takes us to the words; it is compact but exhibits many
 291     of the points under discussion, including explicit handling of the @FreeT@ and @FreeF@
 292     constuctors.  Keep in mind that
 293
 294 >  newtype FreeT f m a  = FreeT (m (FreeF f a (FreeT f m a)))
 295 >  next :: Monad m => Producer a m r -> m (Either r (a, Producer a m r))
 296
 297    Thus the @do@ block after the @FreeT@ constructor is in the base monad, e.g. 'IO' or 'Identity';
 298    the later subordinate block, opened by the @Free@ constructor, is in the @Producer@ monad:
 299
 300 > words :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
 301 > words p = FreeT $ do                   -- With 'next' we will inspect p's first chunk, excluding spaces;
 302 >   x <- next (p >-> dropWhile isSpace)  --   note that 'dropWhile isSpace' is a pipe, and is thus *applied* with '>->'.
 303 >   return $ case x of                   -- We use 'return' and so need something of type 'FreeF (Text m) r (Texts m r)'
 304 >     Left   r       -> Pure r           -- 'Left' means we got no Text chunk, but only the return value; so we are done.
 305 >     Right (txt, p') -> Free $ do       -- If we get a chunk and the rest of the producer, p', we enter the 'Producer' monad
 306 >         p'' <- view (break isSpace)    -- When we apply 'break isSpace', we get a Producer that returns a Producer;
 307 >                     (yield txt >> p')  --   so here we yield everything up to the next space, and get the rest back.
 308 >         return (words p'')             -- We then carry on with the rest, which is likely to begin with space.
 309
 310 -}