Pipes/Text/Tutorial.hs

   1 {-# OPTIONS_GHC -fno-warn-unused-imports #-}
   2
   3 module Pipes.Text.Tutorial (
   4     -- * Effectful Text
   5     -- $intro
   6
   7     -- ** @Pipes.Text@
   8     -- $pipestext
   9
  10     -- ** @Pipes.Text.IO@
  11     -- $pipestextio
  12
  13     -- ** @Pipes.Text.Encoding@
  14     -- $pipestextencoding
  15
  16     -- ** Implicit chunking
  17     -- $chunks
  18
  19     -- * Lenses
  20     -- $lenses
  21
  22     -- ** @view@ \/ @(^.)@
  23     -- $view
  24
  25     -- ** @over@ \/ @(%~)@
  26     -- $over
  27
  28     -- ** @zoom@
  29     -- $zoom
  30
  31
  32
  33
  34     -- * Special types: @Producer Text m (Producer Text m r)@ and @FreeT (Producer Text m) m r@
  35     -- $special
  36     ) where
  37
  38 import Pipes
  39 import Pipes.Text
  40 import Pipes.Text.IO
  41 import Pipes.Text.Encoding
  42
  43 {- $intro
  44     This package provides @pipes@ utilities for /character streams/,
  45     realized as streams of 'Text' chunks. The individual chunks are uniformly /strict/,
  46     and thus the @Text@ type we are using is the one from @Data.Text@, not @Data.Text.Lazy@
  47     But the type @Producer Text m r@, as we are using it, is a sort of /pipes/ equivalent of
  48     the lazy @Text@ type.
  49 -}
  50
  51 {- $pipestext
  52     The main @Pipes.Text@ module provides many functions equivalent
  53     in one way or another to the pure functions in
  54     <https://hackage.haskell.org/package/text-1.1.0.0/docs/Data-Text-Lazy.html Data.Text.Lazy>
  55     (and the corresponding @Prelude@ functions for @String@ s): they transform,
  56     divide, group and fold text streams. Though @Producer Text m r@
  57     is the type of \'effectful Text\', the functions in @Pipes.Text@ are \'pure\'
  58     in the sense that they are uniformly monad-independent.
  59 -}
  60
  61 {- $pipestextencoding
  62     In the @text@ library, @Data.Text.Lazy.Encoding@
  63     handles inter-operation with @Data.ByteString.Lazy@. Here, @Pipes.Text.Encoding@
  64     provides for interoperation with the \'effectful ByteStrings\' of @Pipes.ByteString@.
  65 -}
  66
  67 {- $pipestextio
  68     Simple /IO/ operations are defined in @Pipes.Text.IO@ - as lazy IO @Text@
  69     operations are in @Data.Text.Lazy.IO@. It is generally
  70 -}
  71
  72
  73 {- $chunks
  74     Remember that the @Text@ type exported by @Data.Text.Lazy@ is basically
  75     that of a lazy list of strict @Text@: the implementation is arranged so that
  76     the individual strict 'Text' chunks are kept to a reasonable size; the user
  77     is not aware of the divisions between the connected 'Text' chunks, but uses
  78     operations akin to those for strict text.
  79
  80     So also here: the operations in @Pipes.Text@ are designed to operate on character streams that
  81     in a way that is independent of the boundaries of the underlying @Text@ chunks.
  82     This means that they may freely split text into smaller texts and /discard empty texts/.
  83     The objective, though, is that they should not /concatenate texts/ in order to provide strict upper
  84     bounds on memory usage.
  85
  86     For example, to stream only the first three lines of 'stdin' to 'stdout' you
  87     might write:
  88
  89 > import Pipes
  90 > import qualified Pipes.Text as Text
  91 > import qualified Pipes.Text.IO as Text
  92 > import Pipes.Group (takes')
  93 > import Lens.Family (view, (%~)) -- or, Control.Lens
  94 >
  95 > main = runEffect $ takeLines 3 Text.stdin >-> Text.stdout
  96 >   where
  97 >     takeLines n = view Text.unlines . takes' n . view Text.lines
  98 > -- or equivalently: Text.unlines %~ takes' n
  99
 100      This program will not bring more into memory than what @Text.stdin@ considers
 101      one chunk of text (~ 32 KB), even if individual lines are split
 102      across many chunks.  The division into lines does not join Text fragments.
 103
 104 -}
 105
 106
 107 {- $lenses
 108     As the use of @view@ in this example shows, one superficial difference from @Data.Text.Lazy@
 109     is that many of the operations, like 'lines', are \'lensified\'; this has a
 110     number of advantages; in particular it facilitates their use with 'Parser's of Text
 111     (in the general <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html pipes-parse>
 112     sense.) The remarks that follow in this section are for non-lens adepts.
 113
 114     Each lens exported here, e.g. 'lines', 'chunksOf' or 'splitAt', reduces to the
 115     intuitively corresponding function when used with @view@ or @(^.)@. Instead of
 116     writing:
 117
 118     > splitAt 17 producer
 119
 120     as we would with the Prelude or Text functions called @splitAt@, we write
 121
 122     > view (splitAt 17) producer
 123
 124     or equivalently
 125
 126     > producer ^. splitAt 17
 127
 128     This may seem a little indirect, but note that many equivalents of
 129     @Text -> Text@ functions are exported here as 'Pipe's. Here too we recover the intuitively
 130     corresponding functions by prefixing them with @(>->)@. Thus something like
 131
 132 >  stripLines =  view Text.unlines . Group.maps (>-> Text.stripStart) . view Text.lines
 133
 134     would drop the leading white space from each line.
 135
 136     The lenses in this library are marked as /improper/; this just means that
 137     they don't admit all the operations of an ideal lens, but only /getting/ and /focusing/.
 138     Just for this reason, though, the magnificent complexities of the lens libraries
 139     are a distraction. The lens combinators to keep in mind, the ones that make sense for
 140     our lenses, are @view@, @over@, and @zoom@.
 141
 142     One need only keep in mind that if @l@ is a @Lens' a b@, then:
 143
 144 -}
 145 {- $view
 146     @view l@ is a function @a -> b@ . Thus @view l a@ (also written @a ^. l@ )
 147     is the corresponding @b@; as was said above, this function will typically be
 148     the pipes equivalent of the function you think it is, given its name. So for example
 149
 150     > view (Text.splitAt 300) :: Producer Text m r -> Producer Text (Producer Text m r)
 151     > Text.stdin ^. splitAt 300 :: Producer Text IO (Producer Text IO r)
 152
 153     I.e., it produces the first 300 characters, and returns the rest of the producer.
 154     Thus to uppercase the first n characters
 155     of a Producer, leaving the rest the same, we could write:
 156
 157     > upper n p = do p' <- p ^. Text.splitAt n >-> Text.toUpper
 158     >                p'
 159
 160     or equivalently:
 161
 162     > upper n p = join (p ^. Text.splitAt n >-> Text.toUpper)
 163
 164 -}
 165 {- $over
 166     If @l@ is a @Lens a b@, @over l@ is a function @(b -> b) -> a -> a@.
 167     Thus, given a function that modifies
 168     @b@s, the lens lets us modify an @a@ by applying @f :: b -> b@ to
 169     the @b@ that we \"see\" in the @a@ through the lens.
 170     So the type of @over l f@ is @a -> a@ for the concrete type @a@
 171     (it can also be written @l %~ f@).
 172     For any particular @a@, then, @over l f a@ or @(l %~ f) a@ is a revised @a@.
 173     So above we might have written things like these:
 174
 175     > stripLines = over Text.lines (maps (>-> Text.stripStart))
 176     > stripLines = Text.lines %~ maps (>-> Text.stripStart)
 177     > upper n    =  Text.splitAt n %~ (>-> Text.toUpper)
 178 -}
 179
 180 {- $zoom
 181     @zoom l@, finally, is a function from a @Parser b m r@
 182     to a @Parser a m r@ (or more generally a @StateT (Producer b m x) m r@).
 183     Its use is easiest to see with an decoding lens like 'utf8', which
 184     \"sees\" a Text producer hidden inside a ByteString producer:
 185     @drawChar@ is a Text parser, returning a @Maybe Char@, @zoom utf8 drawChar@ is
 186     a /ByteString/ parser, returning a @Maybe Char@. @drawAll@ is a Parser that returns
 187     a list of everything produced from a Producer, leaving only the return value; it would
 188     usually be unreasonable to use it. But @zoom (splitAt 17) drawAll@
 189     returns a list of Text chunks containing the first seventeen Chars, and returns the rest of
 190     the Text Producer for further parsing. Suppose that we want, inexplicably, to
 191     modify the casing of a Text Producer according to any instruction it might
 192     contain at the start. Then we might write something like this:
 193
 194 >     obey :: Monad m => Producer Text m b -> Producer Text m b
 195 >     obey p = do (ts, p') <- lift $ runStateT (zoom (Text.splitAt 7) drawAll) p
 196 >                 let seven = T.concat ts
 197 >                 case T.toUpper seven of
 198 >                    "TOUPPER" -> p' >-> Text.toUpper
 199 >                    "TOLOWER" -> p' >-> Text.toLower
 200 >                    _         -> do yield seven
 201 >                                    p'
 202
 203
 204 > -- > let doc = each ["toU","pperTh","is document.\n"]
 205 > -- > runEffect $ obey doc >-> Text.stdout
 206 > -- THIS DOCUMENT.
 207
 208     The purpose of exporting lenses is the mental economy achieved with this three-way
 209     applicability. That one expression, e.g. @lines@ or @splitAt 17@ can have these
 210     three uses is no more surprising than that a pipe can act as a function modifying
 211     the output of a producer, namely by using @>->@ to its left: @producer >-> pipe@
 212     -- but can /also/ modify the inputs to a consumer by using @>->@ to its right:
 213     @pipe >-> consumer@
 214
 215     The three functions, @view@ \/ @(^.)@, @over@ \/ @(%~)@ and @zoom@ are supplied by
 216     both <http://hackage.haskell.org/package/lens lens> and
 217     <http://hackage.haskell.org/package/lens-family lens-family> The use of 'zoom' is explained
 218     in <http://hackage.haskell.org/package/pipes-parse-3.0.1/docs/Pipes-Parse-Tutorial.html Pipes.Parse.Tutorial>
 219     and to some extent in the @Pipes.Text.Encoding@ module here.
 220
 221 -}
 222
 223 {- $special
 224     The simple programs using the 'lines' lens reveal a more important difference from @Data.Text.Lazy@ .
 225     This is in the types that are most closely associated with our central text type,
 226     @Producer Text m r@.  In @Data.Text@ and @Data.Text.Lazy@ we find functions like
 227
 228 >   splitAt  :: Int -> Text -> (Text, Text)
 229 >   lines    ::        Text -> [Text]
 230 >   chunksOf :: Int -> Text -> [Text]
 231
 232     which relate a Text with a pair of Texts or a list of Texts.
 233     The corresponding functions here (taking account of \'lensification\') are
 234
 235 >   view . splitAt  :: (Monad m, Integral n) => n -> Producer Text m r -> Producer Text m (Producer Text m r)
 236 >   view lines      :: Monad m               =>      Producer Text m r -> FreeT (Producer Text m) m r
 237 >   view . chunksOf :: (Monad m, Integral n) => n -> Producer Text m r -> FreeT (Producer Text m) m r
 238
 239     Some of the types may be more readable if you imagine that we have introduced
 240     our own type synonyms
 241
 242 >   type Text m r  = Producer T.Text m r
 243 >   type Texts m r = FreeT (Producer T.Text m) m r
 244
 245     Then we would think of the types above as
 246
 247 >   view . splitAt  :: (Monad m, Integral n) => n -> Text m r -> Text m (Text m r)
 248 >   view lines      :: (Monad m)             =>      Text m r -> Texts m r
 249 >   view . chunksOf :: (Monad m, Integral n) => n -> Text m r -> Texts m r
 250
 251     which brings one closer to the types of the similar functions in @Data.Text.Lazy@
 252
 253     In the type @Producer Text m (Producer Text m r)@ the second
 254     element of the \'pair\' of effectful Texts cannot simply be retrieved
 255     with something like 'snd'. This is an \'effectful\' pair, and one must work
 256     through the effects of the first element to arrive at the second Text stream, even
 257     if you are proposing to throw the Text in the first element away.
 258     Note that we use Control.Monad.join to fuse the pair back together, since it specializes to
 259
 260 >    join :: Monad m => Producer Text m (Producer m r) -> Producer m r
 261
 262     The return type of 'lines', 'words', 'chunksOf' and the other /splitter/ functions,
 263     @FreeT (Producer m Text) m r@ -- our @Texts m r@ -- is the type of (effectful)
 264     lists of (effectful) texts. The type @([Text],r)@ might be seen to gather
 265     together things of the forms:
 266
 267 > r
 268 > (Text,r)
 269 > (Text, (Text, r))
 270 > (Text, (Text, (Text, r)))
 271 > (Text, (Text, (Text, (Text, r))))
 272 > ...
 273
 274     (We might also have identified the sum of those types with @Free ((,) Text) r@
 275     -- or, more absurdly, @FreeT ((,) Text) Identity r@.)
 276
 277     Similarly, our type @Texts m r@, or @FreeT (Text m) m r@ -- in fact called
 278     @FreeT (Producer Text m) m r@ here -- encompasses all the members of the sequence:
 279
 280 > m r
 281 > Text m r
 282 > Text m (Text m r)
 283 > Text m (Text m (Text m r))
 284 > Text m (Text m (Text m (Text m r)))
 285 > ...
 286
 287     We might have used a more specialized type in place of @FreeT (Producer a m) m r@,
 288     or indeed of @FreeT (Producer Text m) m r@, but it is clear that the correct
 289     result type of 'lines' will be isomorphic to @FreeT (Producer Text m) m r@ .
 290
 291     One might think that
 292
 293 >   lines :: Monad m => Lens' (Producer Text m r) (FreeT (Producer Text m) m r)
 294 >   view . lines :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
 295
 296     should really have the type
 297
 298 >   lines :: Monad m => Pipe Text Text m r
 299
 300     as e.g. 'toUpper' does. But this would spoil the control we are
 301     attempting to maintain over the size of chunks. It is in fact just
 302     as unreasonable to want such a pipe as to want
 303
 304 > Data.Text.Lazy.lines :: Text -> Text
 305
 306     to 'rechunk' the strict Text chunks inside the lazy Text to respect
 307     line boundaries. In fact we have
 308
 309 > Data.Text.Lazy.lines :: Text -> [Text]
 310 > Prelude.lines :: String -> [String]
 311
 312     where the elements of the list are themselves lazy Texts or Strings; the use
 313     of @FreeT (Producer Text m) m r@ is simply the 'effectful' version of this.
 314
 315     The @Pipes.Group@ module, which can generally be imported without qualification,
 316     provides many functions for working with things of type @FreeT (Producer a m) m r@.
 317     In particular it conveniently exports the constructors for @FreeT@ and the associated
 318     @FreeF@ type -- a fancy form of @Either@, namely
 319
 320 > data FreeF f a b = Pure a | Free (f b)
 321
 322     for pattern-matching. Consider the implementation of the 'words' function, or
 323     of the part of the lens that takes us to the words; it is compact but exhibits many
 324     of the points under discussion, including explicit handling of the @FreeT@ and @FreeF@
 325     constuctors.  Keep in mind that
 326
 327 >  newtype FreeT f m a  = FreeT (m (FreeF f a (FreeT f m a)))
 328 >  next :: Monad m => Producer a m r -> m (Either r (a, Producer a m r))
 329
 330    Thus the @do@ block after the @FreeT@ constructor is in the base monad, e.g. 'IO' or 'Identity';
 331    the later subordinate block, opened by the @Free@ constructor, is in the @Producer@ monad:
 332
 333 > words :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
 334 > words p = FreeT $ do                   -- With 'next' we will inspect p's first chunk, excluding spaces;
 335 >   x <- next (p >-> dropWhile isSpace)  --   note that 'dropWhile isSpace' is a pipe, and is thus *applied* with '>->'.
 336 >   return $ case x of                   -- We use 'return' and so need something of type 'FreeF (Text m) r (Texts m r)'
 337 >     Left   r       -> Pure r           -- 'Left' means we got no Text chunk, but only the return value; so we are done.
 338 >     Right (txt, p') -> Free $ do       -- If we get a chunk and the rest of the producer, p', we enter the 'Producer' monad
 339 >         p'' <- view (break isSpace)    -- When we apply 'break isSpace', we get a Producer that returns a Producer;
 340 >                     (yield txt >> p')  --   so here we yield everything up to the next space, and get the rest back.
 341 >         return (words p'')             -- We then carry on with the rest, which is likely to begin with space.
 342
 343 -}