[github/fretlink/terraform-provider-statuscake.git] / vendor / github.com / ulikunitz / xz / TODO.md

# TODO list

## Release v0.6

1. Review encoder and check for lzma improvements under xz.
2. Fix binary tree matcher.
3. Compare compression ratio with xz tool using comparable parameters
   and optimize parameters
4. Do some optimizations
    - rename operation action and make it a simple type of size 8
    - make maxMatches, wordSize parameters
    - stop searching after a certain length is found (parameter sweetLen)

## Release v0.7

1. Optimize code
2. Do statistical analysis to get linear presets.
3. Test sync.Pool compatability for xz and lzma Writer and Reader
3. Fuzz optimized code.

## Release v0.8

1. Support parallel go routines for writing and reading xz files.
2. Support a ReaderAt interface for xz files with small block sizes.
3. Improve compatibility between gxz and xz
4. Provide manual page for gxz

## Release v0.9

1. Improve documentation
2. Fuzz again

## Release v1.0

1. Full functioning gxz
2. Add godoc URL to README.md (godoc.org)
3. Resolve all issues.
4. Define release candidates.
5. Public announcement.

## Package lzma

### Release v0.6

- Rewrite Encoder into a simple greedy one-op-at-a-time encoder
  including
    + simple scan at the dictionary head for the same byte
    + use the killer byte (requiring matches to get longer, the first
      test should be the byte that would make the match longer)


## Optimizations

- There may be a lot of false sharing in lzma.State; check whether this
  can be improved by reorganizing the internal structure of it.
- Check whether batching encoding and decoding improves speed.

### DAG optimizations

- Use full buffer to create minimal bit-length above range encoder.
- Might be too slow (see v0.4)

### Different match finders

- hashes with 2, 3 characters additional to 4 characters
- binary trees with 2-7 characters (uint64 as key, use uint32 as
  pointers into a an array)
- rb-trees with 2-7 characters (uint64 as key, use uint32 as pointers
  into an array with bit-steeling for the colors)

## Release Procedure

- execute goch -l for all packages; probably with lower param like 0.5.
- check orthography with gospell
- Write release notes in doc/relnotes.
- Update README.md
- xb copyright . in xz directory to ensure all new files have Copyright
  header
- VERSION=<version> go generate github.com/ulikunitz/xz/... to update
  version files
- Execute test for Linux/amd64, Linux/x86 and Windows/amd64.
- Update TODO.md - write short log entry
- git checkout master && git merge dev
- git tag -a <version>
- git push

## Log

### 2018-10-28

Release v0.5.5 fixes issues #19 observing ErrLimit outputs.

### 2017-06-05

Release v0.5.4 fixes issues #15 of another problem with the padding size
check for the xz block header. I removed the check completely.

### 2017-02-15

Release v0.5.3 fixes issue #12 regarding the decompression of an empty
XZ stream. Many thanks to Tomasz Kłak, who reported the issue.

### 2016-12-02

Release v0.5.2 became necessary to allow the decoding of xz files with
4-byte padding in the block header. Many thanks to Greg, who reported
the issue.

### 2016-07-23

Release v0.5.1 became necessary to fix problems with 32-bit platforms.
Many thanks to Bruno Brigas, who reported the issue.

### 2016-07-04

Release v0.5 provides improvements to the compressor and provides support for
the decompression of xz files with multiple xz streams.

### 2016-01-31

Another compression rate increase by checking the byte at length of the
best match first, before checking the whole prefix. This makes the
compressor even faster. We have now a large time budget to beat the
compression ratio of the xz tool. For enwik8 we have now over 40 seconds
to reduce the compressed file size for another 7 MiB.

### 2016-01-30

I simplified the encoder. Speed and compression rate increased
dramatically. A high compression rate affects also the decompression
speed. The approach with the buffer and optimizing for operation
compression rate has not been successful. Going for the maximum length
appears to be the best approach.

### 2016-01-28

The release v0.4 is ready. It provides a working xz implementation,
which is rather slow, but works and is interoperable with the xz tool.
It is an important milestone.

### 2016-01-10

I have the first working implementation of an xz reader and writer. I'm
happy about reaching this milestone.

### 2015-12-02

I'm now ready to implement xz because, I have a working LZMA2
implementation. I decided today that v0.4 will use the slow encoder
using the operations buffer to be able to go back, if I intend to do so.

### 2015-10-21

I have restarted the work on the library. While trying to implement
LZMA2, I discovered that I need to resimplify the encoder and decoder
functions. The option approach is too complicated. Using a limited byte
writer and not caring for written bytes at all and not to try to handle
uncompressed data simplifies the LZMA encoder and decoder much.
Processing uncompressed data and handling limits is a feature of the
LZMA2 format not of LZMA.

I learned an interesting method from the LZO format. If the last copy is
too far away they are moving the head one 2 bytes and not 1 byte to
reduce processing times.

### 2015-08-26

I have now reimplemented the lzma package. The code is reasonably fast,
but can still be optimized. The next step is to implement LZMA2 and then
xz.

### 2015-07-05

Created release v0.3. The version is the foundation for a full xz
implementation that is the target of v0.4.

### 2015-06-11

The gflag package has been developed because I couldn't use flag and
pflag for a fully compatible support of gzip's and lzma's options. It
seems to work now quite nicely.

### 2015-06-05

The overflow issue was interesting to research, however Henry S. Warren
Jr. Hacker's Delight book was very helpful as usual and had the issue
explained perfectly. Fefe's information on his website was based on the
C FAQ and quite bad, because it didn't address the issue of -MININT ==
MININT.

### 2015-06-04

It has been a productive day. I improved the interface of lzma.Reader
and lzma.Writer and fixed the error handling.

### 2015-06-01

By computing the bit length of the LZMA operations I was able to
improve the greedy algorithm implementation. By using an 8 MByte buffer
the compression rate was not as good as for xz but already better then
gzip default.

Compression is currently slow, but this is something we will be able to
improve over time.

### 2015-05-26

Checked the license of ogier/pflag. The binary lzmago binary should
include the license terms for the pflag library.

I added the endorsement clause as used by Google for the Go sources the
LICENSE file.

### 2015-05-22

The package lzb contains now the basic implementation for creating or
reading LZMA byte streams. It allows the support for the implementation
of the DAG-shortest-path algorithm for the compression function.

### 2015-04-23

Completed yesterday the lzbase classes. I'm a little bit concerned that
using the components may require too much code, but on the other hand
there is a lot of flexibility.

### 2015-04-22

Implemented Reader and Writer during the Bayern game against Porto. The
second half gave me enough time.

### 2015-04-21

While showering today morning I discovered that the design for OpEncoder
and OpDecoder doesn't work, because encoding/decoding might depend on
the current status of the dictionary. This is not exactly the right way
to start the day.

Therefore we need to keep the Reader and Writer design. This time around
we simplify it by ignoring size limits. These can be added by wrappers
around the Reader and Writer interfaces. The Parameters type isn't
needed anymore.

However I will implement a ReaderState and WriterState type to use
static typing to ensure the right State object is combined with the
right lzbase.Reader and lzbase.Writer.

As a start I have implemented ReaderState and WriterState to ensure
that the state for reading is only used by readers and WriterState only
used by Writers.

### 2015-04-20

Today I implemented the OpDecoder and tested OpEncoder and OpDecoder.

### 2015-04-08

Came up with a new simplified design for lzbase. I implemented already
the type State that replaces OpCodec.

### 2015-04-06

The new lzma package is now fully usable and lzmago is using it now. The
old lzma package has been completely removed.

### 2015-04-05

Implemented lzma.Reader and tested it.

### 2015-04-04

Implemented baseReader by adapting code form lzma.Reader.

### 2015-04-03

The opCodec has been copied yesterday to lzma2. opCodec has a high
number of dependencies on other files in lzma2. Therefore I had to copy
almost all files from lzma.

### 2015-03-31

Removed only a TODO item.

However in Francesco Campoy's presentation "Go for Javaneros
(Javaïstes?)" is the the idea that using an embedded field E, all the
methods of E will be defined on T. If E is an interface T satisfies E.

https://talks.golang.org/2014/go4java.slide#51

I have never used this, but it seems to be a cool idea.

### 2015-03-30

Finished the type writerDict and wrote a simple test.

### 2015-03-25

I started to implement the writerDict.

### 2015-03-24

After thinking long about the LZMA2 code and several false starts, I
have now a plan to create a self-sufficient lzma2 package that supports
the classic LZMA format as well as LZMA2. The core idea is to support a
baseReader and baseWriter type that support the basic LZMA stream
without any headers. Both types must support the reuse of dictionaries
and the opCodec.

### 2015-01-10

1. Implemented simple lzmago tool
2. Tested tool against large 4.4G file
    - compression worked correctly; tested decompression with lzma
    - decompression hits a full buffer condition
3. Fixed a bug in the compressor and wrote a test for it
4. Executed full cycle for 4.4 GB file; performance can be improved ;-)

### 2015-01-11

- Release v0.2 because of the working LZMA encoder and decoder
Commit	Line	Data
15c0b25d AP	1	# TODO list
	2
	3	## Release v0.6
	4
	5	1. Review encoder and check for lzma improvements under xz.
	6	2. Fix binary tree matcher.
	7	3. Compare compression ratio with xz tool using comparable parameters
	8	and optimize parameters
	9	4. Do some optimizations
	10	- rename operation action and make it a simple type of size 8
	11	- make maxMatches, wordSize parameters
	12	- stop searching after a certain length is found (parameter sweetLen)
	13
	14	## Release v0.7
	15
	16	1. Optimize code
	17	2. Do statistical analysis to get linear presets.
	18	3. Test sync.Pool compatability for xz and lzma Writer and Reader
	19	3. Fuzz optimized code.
	20
	21	## Release v0.8
	22
	23	1. Support parallel go routines for writing and reading xz files.
	24	2. Support a ReaderAt interface for xz files with small block sizes.
	25	3. Improve compatibility between gxz and xz
	26	4. Provide manual page for gxz
	27
	28	## Release v0.9
	29
	30	1. Improve documentation
	31	2. Fuzz again
	32
	33	## Release v1.0
	34
	35	1. Full functioning gxz
	36	2. Add godoc URL to README.md (godoc.org)
	37	3. Resolve all issues.
	38	4. Define release candidates.
	39	5. Public announcement.
	40
	41	## Package lzma
	42
	43	### Release v0.6
	44
	45	- Rewrite Encoder into a simple greedy one-op-at-a-time encoder
	46	including
	47	+ simple scan at the dictionary head for the same byte
	48	+ use the killer byte (requiring matches to get longer, the first
	49	test should be the byte that would make the match longer)
	50
	51
	52	## Optimizations
	53
	54	- There may be a lot of false sharing in lzma.State; check whether this
	55	can be improved by reorganizing the internal structure of it.
	56	- Check whether batching encoding and decoding improves speed.
	57
	58	### DAG optimizations
	59
	60	- Use full buffer to create minimal bit-length above range encoder.
	61	- Might be too slow (see v0.4)
	62
	63	### Different match finders
	64
65	- hashes with 2, 3 characters additional to 4 characters
66	- binary trees with 2-7 characters (uint64 as key, use uint32 as
67	pointers into a an array)
68	- rb-trees with 2-7 characters (uint64 as key, use uint32 as pointers
69	into an array with bit-steeling for the colors)
70
71	## Release Procedure
72
73	- execute goch -l for all packages; probably with lower param like 0.5.
74	- check orthography with gospell
75	- Write release notes in doc/relnotes.
76	- Update README.md
77	- xb copyright . in xz directory to ensure all new files have Copyright
78	header
79	- VERSION=<version> go generate github.com/ulikunitz/xz/... to update
80	version files
81	- Execute test for Linux/amd64, Linux/x86 and Windows/amd64.
82	- Update TODO.md - write short log entry
83	- git checkout master && git merge dev
84	- git tag -a <version>
85	- git push
86
87	## Log
88
107c1cdb ND	89	### 2018-10-28
	90
	91	Release v0.5.5 fixes issues #19 observing ErrLimit outputs.
	92
15c0b25d AP	93	### 2017-06-05
	94
	95	Release v0.5.4 fixes issues #15 of another problem with the padding size
	96	check for the xz block header. I removed the check completely.
	97
	98	### 2017-02-15
	99
	100	Release v0.5.3 fixes issue #12 regarding the decompression of an empty
	101	XZ stream. Many thanks to Tomasz Kłak, who reported the issue.
	102
	103	### 2016-12-02
	104
	105	Release v0.5.2 became necessary to allow the decoding of xz files with
	106	4-byte padding in the block header. Many thanks to Greg, who reported
	107	the issue.
	108
107c1cdb	109	### 2016-07-23
15c0b25d AP	110
	111	Release v0.5.1 became necessary to fix problems with 32-bit platforms.
	112	Many thanks to Bruno Brigas, who reported the issue.
	113
	114	### 2016-07-04
	115
	116	Release v0.5 provides improvements to the compressor and provides support for
	117	the decompression of xz files with multiple xz streams.
	118
	119	### 2016-01-31
	120
	121	Another compression rate increase by checking the byte at length of the
	122	best match first, before checking the whole prefix. This makes the
	123	compressor even faster. We have now a large time budget to beat the
	124	compression ratio of the xz tool. For enwik8 we have now over 40 seconds
	125	to reduce the compressed file size for another 7 MiB.
	126
	127	### 2016-01-30
	128
	129	I simplified the encoder. Speed and compression rate increased
	130	dramatically. A high compression rate affects also the decompression
	131	speed. The approach with the buffer and optimizing for operation
	132	compression rate has not been successful. Going for the maximum length
	133	appears to be the best approach.
	134
	135	### 2016-01-28
	136
	137	The release v0.4 is ready. It provides a working xz implementation,
	138	which is rather slow, but works and is interoperable with the xz tool.
	139	It is an important milestone.
	140
	141	### 2016-01-10
	142
	143	I have the first working implementation of an xz reader and writer. I'm
	144	happy about reaching this milestone.
	145
	146	### 2015-12-02
	147
	148	I'm now ready to implement xz because, I have a working LZMA2
	149	implementation. I decided today that v0.4 will use the slow encoder
	150	using the operations buffer to be able to go back, if I intend to do so.
	151
	152	### 2015-10-21
	153
	154	I have restarted the work on the library. While trying to implement
	155	LZMA2, I discovered that I need to resimplify the encoder and decoder
	156	functions. The option approach is too complicated. Using a limited byte
	157	writer and not caring for written bytes at all and not to try to handle
	158	uncompressed data simplifies the LZMA encoder and decoder much.
	159	Processing uncompressed data and handling limits is a feature of the
	160	LZMA2 format not of LZMA.
	161
	162	I learned an interesting method from the LZO format. If the last copy is
	163	too far away they are moving the head one 2 bytes and not 1 byte to
	164	reduce processing times.
	165
	166	### 2015-08-26
	167
	168	I have now reimplemented the lzma package. The code is reasonably fast,
	169	but can still be optimized. The next step is to implement LZMA2 and then
	170	xz.
	171
	172	### 2015-07-05
	173
174	Created release v0.3. The version is the foundation for a full xz
175	implementation that is the target of v0.4.
176
177	### 2015-06-11
178
179	The gflag package has been developed because I couldn't use flag and
180	pflag for a fully compatible support of gzip's and lzma's options. It
181	seems to work now quite nicely.
182
183	### 2015-06-05
184
185	The overflow issue was interesting to research, however Henry S. Warren
186	Jr. Hacker's Delight book was very helpful as usual and had the issue
187	explained perfectly. Fefe's information on his website was based on the
188	C FAQ and quite bad, because it didn't address the issue of -MININT ==
189	MININT.
190
191	### 2015-06-04
192
193	It has been a productive day. I improved the interface of lzma.Reader
194	and lzma.Writer and fixed the error handling.
195
196	### 2015-06-01
197
198	By computing the bit length of the LZMA operations I was able to
199	improve the greedy algorithm implementation. By using an 8 MByte buffer
200	the compression rate was not as good as for xz but already better then
107c1cdb	201	gzip default.
15c0b25d AP	202
	203	Compression is currently slow, but this is something we will be able to
	204	improve over time.
	205
	206	### 2015-05-26
	207
	208	Checked the license of ogier/pflag. The binary lzmago binary should
	209	include the license terms for the pflag library.
	210
	211	I added the endorsement clause as used by Google for the Go sources the
	212	LICENSE file.
	213
	214	### 2015-05-22
	215
	216	The package lzb contains now the basic implementation for creating or
	217	reading LZMA byte streams. It allows the support for the implementation
	218	of the DAG-shortest-path algorithm for the compression function.
	219
107c1cdb	220	### 2015-04-23
15c0b25d AP	221
	222	Completed yesterday the lzbase classes. I'm a little bit concerned that
	223	using the components may require too much code, but on the other hand
	224	there is a lot of flexibility.
	225
	226	### 2015-04-22
	227
	228	Implemented Reader and Writer during the Bayern game against Porto. The
	229	second half gave me enough time.
	230
	231	### 2015-04-21
	232
	233	While showering today morning I discovered that the design for OpEncoder
	234	and OpDecoder doesn't work, because encoding/decoding might depend on
	235	the current status of the dictionary. This is not exactly the right way
	236	to start the day.
	237
	238	Therefore we need to keep the Reader and Writer design. This time around
	239	we simplify it by ignoring size limits. These can be added by wrappers
	240	around the Reader and Writer interfaces. The Parameters type isn't
	241	needed anymore.
	242
	243	However I will implement a ReaderState and WriterState type to use
	244	static typing to ensure the right State object is combined with the
	245	right lzbase.Reader and lzbase.Writer.
	246
	247	As a start I have implemented ReaderState and WriterState to ensure
	248	that the state for reading is only used by readers and WriterState only
107c1cdb	249	used by Writers.
15c0b25d AP	250
	251	### 2015-04-20
	252
	253	Today I implemented the OpDecoder and tested OpEncoder and OpDecoder.
	254
	255	### 2015-04-08
	256
	257	Came up with a new simplified design for lzbase. I implemented already
	258	the type State that replaces OpCodec.
	259
	260	### 2015-04-06
	261
	262	The new lzma package is now fully usable and lzmago is using it now. The
	263	old lzma package has been completely removed.
	264
	265	### 2015-04-05
	266
	267	Implemented lzma.Reader and tested it.
	268
	269	### 2015-04-04
	270
	271	Implemented baseReader by adapting code form lzma.Reader.
	272
	273	### 2015-04-03
	274
	275	The opCodec has been copied yesterday to lzma2. opCodec has a high
	276	number of dependencies on other files in lzma2. Therefore I had to copy
	277	almost all files from lzma.
	278
	279	### 2015-03-31
	280
107c1cdb	281	Removed only a TODO item.
15c0b25d AP	282
	283	However in Francesco Campoy's presentation "Go for Javaneros
	284	(Javaïstes?)" is the the idea that using an embedded field E, all the
	285	methods of E will be defined on T. If E is an interface T satisfies E.
	286
	287	https://talks.golang.org/2014/go4java.slide#51
	288
	289	I have never used this, but it seems to be a cool idea.
	290
	291	### 2015-03-30
	292
	293	Finished the type writerDict and wrote a simple test.
	294
	295	### 2015-03-25
	296
	297	I started to implement the writerDict.
	298
	299	### 2015-03-24
	300
	301	After thinking long about the LZMA2 code and several false starts, I
	302	have now a plan to create a self-sufficient lzma2 package that supports
	303	the classic LZMA format as well as LZMA2. The core idea is to support a
	304	baseReader and baseWriter type that support the basic LZMA stream
	305	without any headers. Both types must support the reuse of dictionaries
	306	and the opCodec.
	307
	308	### 2015-01-10
	309
	310	1. Implemented simple lzmago tool
	311	2. Tested tool against large 4.4G file
	312	- compression worked correctly; tested decompression with lzma
	313	- decompression hits a full buffer condition
	314	3. Fixed a bug in the compressor and wrote a test for it
	315	4. Executed full cycle for 4.4 GB file; performance can be improved ;-)
	316
	317	### 2015-01-11
	318
	319	- Release v0.2 because of the working LZMA encoder and decoder