vendor/github.com/ulikunitz/xz/TODO.md

   1 # TODO list
   2
   3 ## Release v0.6
   4
   5 1. Review encoder and check for lzma improvements under xz.
   6 2. Fix binary tree matcher.
   7 3. Compare compression ratio with xz tool using comparable parameters
   8    and optimize parameters
   9 4. Do some optimizations
  10     - rename operation action and make it a simple type of size 8
  11     - make maxMatches, wordSize parameters
  12     - stop searching after a certain length is found (parameter sweetLen)
  13
  14 ## Release v0.7
  15
  16 1. Optimize code
  17 2. Do statistical analysis to get linear presets.
  18 3. Test sync.Pool compatability for xz and lzma Writer and Reader
  19 3. Fuzz optimized code.
  20
  21 ## Release v0.8
  22
  23 1. Support parallel go routines for writing and reading xz files.
  24 2. Support a ReaderAt interface for xz files with small block sizes.
  25 3. Improve compatibility between gxz and xz
  26 4. Provide manual page for gxz
  27
  28 ## Release v0.9
  29
  30 1. Improve documentation
  31 2. Fuzz again
  32
  33 ## Release v1.0
  34
  35 1. Full functioning gxz
  36 2. Add godoc URL to README.md (godoc.org)
  37 3. Resolve all issues.
  38 4. Define release candidates.
  39 5. Public announcement.
  40
  41 ## Package lzma
  42
  43 ### Release v0.6
  44
  45 - Rewrite Encoder into a simple greedy one-op-at-a-time encoder
  46   including
  47     + simple scan at the dictionary head for the same byte
  48     + use the killer byte (requiring matches to get longer, the first
  49       test should be the byte that would make the match longer)
  50
  51
  52 ## Optimizations
  53
  54 - There may be a lot of false sharing in lzma.State; check whether this
  55   can be improved by reorganizing the internal structure of it.
  56 - Check whether batching encoding and decoding improves speed.
  57
  58 ### DAG optimizations
  59
  60 - Use full buffer to create minimal bit-length above range encoder.
  61 - Might be too slow (see v0.4)
  62
  63 ### Different match finders
  64
  65 - hashes with 2, 3 characters additional to 4 characters
  66 - binary trees with 2-7 characters (uint64 as key, use uint32 as
  67   pointers into a an array)
  68 - rb-trees with 2-7 characters (uint64 as key, use uint32 as pointers
  69   into an array with bit-steeling for the colors)
  70
  71 ## Release Procedure
  72
  73 - execute goch -l for all packages; probably with lower param like 0.5.
  74 - check orthography with gospell
  75 - Write release notes in doc/relnotes.
  76 - Update README.md
  77 - xb copyright . in xz directory to ensure all new files have Copyright
  78   header
  79 - VERSION=<version> go generate github.com/ulikunitz/xz/... to update
  80   version files
  81 - Execute test for Linux/amd64, Linux/x86 and Windows/amd64.
  82 - Update TODO.md - write short log entry
  83 - git checkout master && git merge dev
  84 - git tag -a <version>
  85 - git push
  86
  87 ## Log
  88
  89 ### 2018-10-28
  90
  91 Release v0.5.5 fixes issues #19 observing ErrLimit outputs.
  92
  93 ### 2017-06-05
  94
  95 Release v0.5.4 fixes issues #15 of another problem with the padding size
  96 check for the xz block header. I removed the check completely.
  97
  98 ### 2017-02-15
  99
 100 Release v0.5.3 fixes issue #12 regarding the decompression of an empty
 101 XZ stream. Many thanks to Tomasz Kłak, who reported the issue.
 102
 103 ### 2016-12-02
 104
 105 Release v0.5.2 became necessary to allow the decoding of xz files with
 106 4-byte padding in the block header. Many thanks to Greg, who reported
 107 the issue.
 108
 109 ### 2016-07-23
 110
 111 Release v0.5.1 became necessary to fix problems with 32-bit platforms.
 112 Many thanks to Bruno Brigas, who reported the issue.
 113
 114 ### 2016-07-04
 115
 116 Release v0.5 provides improvements to the compressor and provides support for
 117 the decompression of xz files with multiple xz streams.
 118
 119 ### 2016-01-31
 120
 121 Another compression rate increase by checking the byte at length of the
 122 best match first, before checking the whole prefix. This makes the
 123 compressor even faster. We have now a large time budget to beat the
 124 compression ratio of the xz tool. For enwik8 we have now over 40 seconds
 125 to reduce the compressed file size for another 7 MiB.
 126
 127 ### 2016-01-30
 128
 129 I simplified the encoder. Speed and compression rate increased
 130 dramatically. A high compression rate affects also the decompression
 131 speed. The approach with the buffer and optimizing for operation
 132 compression rate has not been successful. Going for the maximum length
 133 appears to be the best approach.
 134
 135 ### 2016-01-28
 136
 137 The release v0.4 is ready. It provides a working xz implementation,
 138 which is rather slow, but works and is interoperable with the xz tool.
 139 It is an important milestone.
 140
 141 ### 2016-01-10
 142
 143 I have the first working implementation of an xz reader and writer. I'm
 144 happy about reaching this milestone.
 145
 146 ### 2015-12-02
 147
 148 I'm now ready to implement xz because, I have a working LZMA2
 149 implementation. I decided today that v0.4 will use the slow encoder
 150 using the operations buffer to be able to go back, if I intend to do so.
 151
 152 ### 2015-10-21
 153
 154 I have restarted the work on the library. While trying to implement
 155 LZMA2, I discovered that I need to resimplify the encoder and decoder
 156 functions. The option approach is too complicated. Using a limited byte
 157 writer and not caring for written bytes at all and not to try to handle
 158 uncompressed data simplifies the LZMA encoder and decoder much.
 159 Processing uncompressed data and handling limits is a feature of the
 160 LZMA2 format not of LZMA.
 161
 162 I learned an interesting method from the LZO format. If the last copy is
 163 too far away they are moving the head one 2 bytes and not 1 byte to
 164 reduce processing times.
 165
 166 ### 2015-08-26
 167
 168 I have now reimplemented the lzma package. The code is reasonably fast,
 169 but can still be optimized. The next step is to implement LZMA2 and then
 170 xz.
 171
 172 ### 2015-07-05
 173
 174 Created release v0.3. The version is the foundation for a full xz
 175 implementation that is the target of v0.4.
 176
 177 ### 2015-06-11
 178
 179 The gflag package has been developed because I couldn't use flag and
 180 pflag for a fully compatible support of gzip's and lzma's options. It
 181 seems to work now quite nicely.
 182
 183 ### 2015-06-05
 184
 185 The overflow issue was interesting to research, however Henry S. Warren
 186 Jr. Hacker's Delight book was very helpful as usual and had the issue
 187 explained perfectly. Fefe's information on his website was based on the
 188 C FAQ and quite bad, because it didn't address the issue of -MININT ==
 189 MININT.
 190
 191 ### 2015-06-04
 192
 193 It has been a productive day. I improved the interface of lzma.Reader
 194 and lzma.Writer and fixed the error handling.
 195
 196 ### 2015-06-01
 197
 198 By computing the bit length of the LZMA operations I was able to
 199 improve the greedy algorithm implementation. By using an 8 MByte buffer
 200 the compression rate was not as good as for xz but already better then
 201 gzip default.
 202
 203 Compression is currently slow, but this is something we will be able to
 204 improve over time.
 205
 206 ### 2015-05-26
 207
 208 Checked the license of ogier/pflag. The binary lzmago binary should
 209 include the license terms for the pflag library.
 210
 211 I added the endorsement clause as used by Google for the Go sources the
 212 LICENSE file.
 213
 214 ### 2015-05-22
 215
 216 The package lzb contains now the basic implementation for creating or
 217 reading LZMA byte streams. It allows the support for the implementation
 218 of the DAG-shortest-path algorithm for the compression function.
 219
 220 ### 2015-04-23
 221
 222 Completed yesterday the lzbase classes. I'm a little bit concerned that
 223 using the components may require too much code, but on the other hand
 224 there is a lot of flexibility.
 225
 226 ### 2015-04-22
 227
 228 Implemented Reader and Writer during the Bayern game against Porto. The
 229 second half gave me enough time.
 230
 231 ### 2015-04-21
 232
 233 While showering today morning I discovered that the design for OpEncoder
 234 and OpDecoder doesn't work, because encoding/decoding might depend on
 235 the current status of the dictionary. This is not exactly the right way
 236 to start the day.
 237
 238 Therefore we need to keep the Reader and Writer design. This time around
 239 we simplify it by ignoring size limits. These can be added by wrappers
 240 around the Reader and Writer interfaces. The Parameters type isn't
 241 needed anymore.
 242
 243 However I will implement a ReaderState and WriterState type to use
 244 static typing to ensure the right State object is combined with the
 245 right lzbase.Reader and lzbase.Writer.
 246
 247 As a start I have implemented ReaderState and WriterState to ensure
 248 that the state for reading is only used by readers and WriterState only
 249 used by Writers.
 250
 251 ### 2015-04-20
 252
 253 Today I implemented the OpDecoder and tested OpEncoder and OpDecoder.
 254
 255 ### 2015-04-08
 256
 257 Came up with a new simplified design for lzbase. I implemented already
 258 the type State that replaces OpCodec.
 259
 260 ### 2015-04-06
 261
 262 The new lzma package is now fully usable and lzmago is using it now. The
 263 old lzma package has been completely removed.
 264
 265 ### 2015-04-05
 266
 267 Implemented lzma.Reader and tested it.
 268
 269 ### 2015-04-04
 270
 271 Implemented baseReader by adapting code form lzma.Reader.
 272
 273 ### 2015-04-03
 274
 275 The opCodec has been copied yesterday to lzma2. opCodec has a high
 276 number of dependencies on other files in lzma2. Therefore I had to copy
 277 almost all files from lzma.
 278
 279 ### 2015-03-31
 280
 281 Removed only a TODO item.
 282
 283 However in Francesco Campoy's presentation "Go for Javaneros
 284 (Javaïstes?)" is the the idea that using an embedded field E, all the
 285 methods of E will be defined on T. If E is an interface T satisfies E.
 286
 287 https://talks.golang.org/2014/go4java.slide#51
 288
 289 I have never used this, but it seems to be a cool idea.
 290
 291 ### 2015-03-30
 292
 293 Finished the type writerDict and wrote a simple test.
 294
 295 ### 2015-03-25
 296
 297 I started to implement the writerDict.
 298
 299 ### 2015-03-24
 300
 301 After thinking long about the LZMA2 code and several false starts, I
 302 have now a plan to create a self-sufficient lzma2 package that supports
 303 the classic LZMA format as well as LZMA2. The core idea is to support a
 304 baseReader and baseWriter type that support the basic LZMA stream
 305 without any headers. Both types must support the reuse of dictionaries
 306 and the opCodec.
 307
 308 ### 2015-01-10
 309
 310 1. Implemented simple lzmago tool
 311 2. Tested tool against large 4.4G file
 312     - compression worked correctly; tested decompression with lzma
 313     - decompression hits a full buffer condition
 314 3. Fixed a bug in the compressor and wrote a test for it
 315 4. Executed full cycle for 4.4 GB file; performance can be improved ;-)
 316
 317 ### 2015-01-11
 318
 319 - Release v0.2 because of the working LZMA encoder and decoder