]>
Commit | Line | Data |
---|---|---|
15c0b25d AP |
1 | # TODO list |
2 | ||
3 | ## Release v0.6 | |
4 | ||
5 | 1. Review encoder and check for lzma improvements under xz. | |
6 | 2. Fix binary tree matcher. | |
7 | 3. Compare compression ratio with xz tool using comparable parameters | |
8 | and optimize parameters | |
9 | 4. Do some optimizations | |
10 | - rename operation action and make it a simple type of size 8 | |
11 | - make maxMatches, wordSize parameters | |
12 | - stop searching after a certain length is found (parameter sweetLen) | |
13 | ||
14 | ## Release v0.7 | |
15 | ||
16 | 1. Optimize code | |
17 | 2. Do statistical analysis to get linear presets. | |
18 | 3. Test sync.Pool compatability for xz and lzma Writer and Reader | |
19 | 3. Fuzz optimized code. | |
20 | ||
21 | ## Release v0.8 | |
22 | ||
23 | 1. Support parallel go routines for writing and reading xz files. | |
24 | 2. Support a ReaderAt interface for xz files with small block sizes. | |
25 | 3. Improve compatibility between gxz and xz | |
26 | 4. Provide manual page for gxz | |
27 | ||
28 | ## Release v0.9 | |
29 | ||
30 | 1. Improve documentation | |
31 | 2. Fuzz again | |
32 | ||
33 | ## Release v1.0 | |
34 | ||
35 | 1. Full functioning gxz | |
36 | 2. Add godoc URL to README.md (godoc.org) | |
37 | 3. Resolve all issues. | |
38 | 4. Define release candidates. | |
39 | 5. Public announcement. | |
40 | ||
41 | ## Package lzma | |
42 | ||
43 | ### Release v0.6 | |
44 | ||
45 | - Rewrite Encoder into a simple greedy one-op-at-a-time encoder | |
46 | including | |
47 | + simple scan at the dictionary head for the same byte | |
48 | + use the killer byte (requiring matches to get longer, the first | |
49 | test should be the byte that would make the match longer) | |
50 | ||
51 | ||
52 | ## Optimizations | |
53 | ||
54 | - There may be a lot of false sharing in lzma.State; check whether this | |
55 | can be improved by reorganizing the internal structure of it. | |
56 | - Check whether batching encoding and decoding improves speed. | |
57 | ||
58 | ### DAG optimizations | |
59 | ||
60 | - Use full buffer to create minimal bit-length above range encoder. | |
61 | - Might be too slow (see v0.4) | |
62 | ||
63 | ### Different match finders | |
64 | ||
65 | - hashes with 2, 3 characters additional to 4 characters | |
66 | - binary trees with 2-7 characters (uint64 as key, use uint32 as | |
67 | pointers into a an array) | |
68 | - rb-trees with 2-7 characters (uint64 as key, use uint32 as pointers | |
69 | into an array with bit-steeling for the colors) | |
70 | ||
71 | ## Release Procedure | |
72 | ||
73 | - execute goch -l for all packages; probably with lower param like 0.5. | |
74 | - check orthography with gospell | |
75 | - Write release notes in doc/relnotes. | |
76 | - Update README.md | |
77 | - xb copyright . in xz directory to ensure all new files have Copyright | |
78 | header | |
79 | - VERSION=<version> go generate github.com/ulikunitz/xz/... to update | |
80 | version files | |
81 | - Execute test for Linux/amd64, Linux/x86 and Windows/amd64. | |
82 | - Update TODO.md - write short log entry | |
83 | - git checkout master && git merge dev | |
84 | - git tag -a <version> | |
85 | - git push | |
86 | ||
87 | ## Log | |
88 | ||
107c1cdb ND |
89 | ### 2018-10-28 |
90 | ||
91 | Release v0.5.5 fixes issues #19 observing ErrLimit outputs. | |
92 | ||
15c0b25d AP |
93 | ### 2017-06-05 |
94 | ||
95 | Release v0.5.4 fixes issues #15 of another problem with the padding size | |
96 | check for the xz block header. I removed the check completely. | |
97 | ||
98 | ### 2017-02-15 | |
99 | ||
100 | Release v0.5.3 fixes issue #12 regarding the decompression of an empty | |
101 | XZ stream. Many thanks to Tomasz Kłak, who reported the issue. | |
102 | ||
103 | ### 2016-12-02 | |
104 | ||
105 | Release v0.5.2 became necessary to allow the decoding of xz files with | |
106 | 4-byte padding in the block header. Many thanks to Greg, who reported | |
107 | the issue. | |
108 | ||
107c1cdb | 109 | ### 2016-07-23 |
15c0b25d AP |
110 | |
111 | Release v0.5.1 became necessary to fix problems with 32-bit platforms. | |
112 | Many thanks to Bruno Brigas, who reported the issue. | |
113 | ||
114 | ### 2016-07-04 | |
115 | ||
116 | Release v0.5 provides improvements to the compressor and provides support for | |
117 | the decompression of xz files with multiple xz streams. | |
118 | ||
119 | ### 2016-01-31 | |
120 | ||
121 | Another compression rate increase by checking the byte at length of the | |
122 | best match first, before checking the whole prefix. This makes the | |
123 | compressor even faster. We have now a large time budget to beat the | |
124 | compression ratio of the xz tool. For enwik8 we have now over 40 seconds | |
125 | to reduce the compressed file size for another 7 MiB. | |
126 | ||
127 | ### 2016-01-30 | |
128 | ||
129 | I simplified the encoder. Speed and compression rate increased | |
130 | dramatically. A high compression rate affects also the decompression | |
131 | speed. The approach with the buffer and optimizing for operation | |
132 | compression rate has not been successful. Going for the maximum length | |
133 | appears to be the best approach. | |
134 | ||
135 | ### 2016-01-28 | |
136 | ||
137 | The release v0.4 is ready. It provides a working xz implementation, | |
138 | which is rather slow, but works and is interoperable with the xz tool. | |
139 | It is an important milestone. | |
140 | ||
141 | ### 2016-01-10 | |
142 | ||
143 | I have the first working implementation of an xz reader and writer. I'm | |
144 | happy about reaching this milestone. | |
145 | ||
146 | ### 2015-12-02 | |
147 | ||
148 | I'm now ready to implement xz because, I have a working LZMA2 | |
149 | implementation. I decided today that v0.4 will use the slow encoder | |
150 | using the operations buffer to be able to go back, if I intend to do so. | |
151 | ||
152 | ### 2015-10-21 | |
153 | ||
154 | I have restarted the work on the library. While trying to implement | |
155 | LZMA2, I discovered that I need to resimplify the encoder and decoder | |
156 | functions. The option approach is too complicated. Using a limited byte | |
157 | writer and not caring for written bytes at all and not to try to handle | |
158 | uncompressed data simplifies the LZMA encoder and decoder much. | |
159 | Processing uncompressed data and handling limits is a feature of the | |
160 | LZMA2 format not of LZMA. | |
161 | ||
162 | I learned an interesting method from the LZO format. If the last copy is | |
163 | too far away they are moving the head one 2 bytes and not 1 byte to | |
164 | reduce processing times. | |
165 | ||
166 | ### 2015-08-26 | |
167 | ||
168 | I have now reimplemented the lzma package. The code is reasonably fast, | |
169 | but can still be optimized. The next step is to implement LZMA2 and then | |
170 | xz. | |
171 | ||
172 | ### 2015-07-05 | |
173 | ||
174 | Created release v0.3. The version is the foundation for a full xz | |
175 | implementation that is the target of v0.4. | |
176 | ||
177 | ### 2015-06-11 | |
178 | ||
179 | The gflag package has been developed because I couldn't use flag and | |
180 | pflag for a fully compatible support of gzip's and lzma's options. It | |
181 | seems to work now quite nicely. | |
182 | ||
183 | ### 2015-06-05 | |
184 | ||
185 | The overflow issue was interesting to research, however Henry S. Warren | |
186 | Jr. Hacker's Delight book was very helpful as usual and had the issue | |
187 | explained perfectly. Fefe's information on his website was based on the | |
188 | C FAQ and quite bad, because it didn't address the issue of -MININT == | |
189 | MININT. | |
190 | ||
191 | ### 2015-06-04 | |
192 | ||
193 | It has been a productive day. I improved the interface of lzma.Reader | |
194 | and lzma.Writer and fixed the error handling. | |
195 | ||
196 | ### 2015-06-01 | |
197 | ||
198 | By computing the bit length of the LZMA operations I was able to | |
199 | improve the greedy algorithm implementation. By using an 8 MByte buffer | |
200 | the compression rate was not as good as for xz but already better then | |
107c1cdb | 201 | gzip default. |
15c0b25d AP |
202 | |
203 | Compression is currently slow, but this is something we will be able to | |
204 | improve over time. | |
205 | ||
206 | ### 2015-05-26 | |
207 | ||
208 | Checked the license of ogier/pflag. The binary lzmago binary should | |
209 | include the license terms for the pflag library. | |
210 | ||
211 | I added the endorsement clause as used by Google for the Go sources the | |
212 | LICENSE file. | |
213 | ||
214 | ### 2015-05-22 | |
215 | ||
216 | The package lzb contains now the basic implementation for creating or | |
217 | reading LZMA byte streams. It allows the support for the implementation | |
218 | of the DAG-shortest-path algorithm for the compression function. | |
219 | ||
107c1cdb | 220 | ### 2015-04-23 |
15c0b25d AP |
221 | |
222 | Completed yesterday the lzbase classes. I'm a little bit concerned that | |
223 | using the components may require too much code, but on the other hand | |
224 | there is a lot of flexibility. | |
225 | ||
226 | ### 2015-04-22 | |
227 | ||
228 | Implemented Reader and Writer during the Bayern game against Porto. The | |
229 | second half gave me enough time. | |
230 | ||
231 | ### 2015-04-21 | |
232 | ||
233 | While showering today morning I discovered that the design for OpEncoder | |
234 | and OpDecoder doesn't work, because encoding/decoding might depend on | |
235 | the current status of the dictionary. This is not exactly the right way | |
236 | to start the day. | |
237 | ||
238 | Therefore we need to keep the Reader and Writer design. This time around | |
239 | we simplify it by ignoring size limits. These can be added by wrappers | |
240 | around the Reader and Writer interfaces. The Parameters type isn't | |
241 | needed anymore. | |
242 | ||
243 | However I will implement a ReaderState and WriterState type to use | |
244 | static typing to ensure the right State object is combined with the | |
245 | right lzbase.Reader and lzbase.Writer. | |
246 | ||
247 | As a start I have implemented ReaderState and WriterState to ensure | |
248 | that the state for reading is only used by readers and WriterState only | |
107c1cdb | 249 | used by Writers. |
15c0b25d AP |
250 | |
251 | ### 2015-04-20 | |
252 | ||
253 | Today I implemented the OpDecoder and tested OpEncoder and OpDecoder. | |
254 | ||
255 | ### 2015-04-08 | |
256 | ||
257 | Came up with a new simplified design for lzbase. I implemented already | |
258 | the type State that replaces OpCodec. | |
259 | ||
260 | ### 2015-04-06 | |
261 | ||
262 | The new lzma package is now fully usable and lzmago is using it now. The | |
263 | old lzma package has been completely removed. | |
264 | ||
265 | ### 2015-04-05 | |
266 | ||
267 | Implemented lzma.Reader and tested it. | |
268 | ||
269 | ### 2015-04-04 | |
270 | ||
271 | Implemented baseReader by adapting code form lzma.Reader. | |
272 | ||
273 | ### 2015-04-03 | |
274 | ||
275 | The opCodec has been copied yesterday to lzma2. opCodec has a high | |
276 | number of dependencies on other files in lzma2. Therefore I had to copy | |
277 | almost all files from lzma. | |
278 | ||
279 | ### 2015-03-31 | |
280 | ||
107c1cdb | 281 | Removed only a TODO item. |
15c0b25d AP |
282 | |
283 | However in Francesco Campoy's presentation "Go for Javaneros | |
284 | (Javaïstes?)" is the the idea that using an embedded field E, all the | |
285 | methods of E will be defined on T. If E is an interface T satisfies E. | |
286 | ||
287 | https://talks.golang.org/2014/go4java.slide#51 | |
288 | ||
289 | I have never used this, but it seems to be a cool idea. | |
290 | ||
291 | ### 2015-03-30 | |
292 | ||
293 | Finished the type writerDict and wrote a simple test. | |
294 | ||
295 | ### 2015-03-25 | |
296 | ||
297 | I started to implement the writerDict. | |
298 | ||
299 | ### 2015-03-24 | |
300 | ||
301 | After thinking long about the LZMA2 code and several false starts, I | |
302 | have now a plan to create a self-sufficient lzma2 package that supports | |
303 | the classic LZMA format as well as LZMA2. The core idea is to support a | |
304 | baseReader and baseWriter type that support the basic LZMA stream | |
305 | without any headers. Both types must support the reuse of dictionaries | |
306 | and the opCodec. | |
307 | ||
308 | ### 2015-01-10 | |
309 | ||
310 | 1. Implemented simple lzmago tool | |
311 | 2. Tested tool against large 4.4G file | |
312 | - compression worked correctly; tested decompression with lzma | |
313 | - decompression hits a full buffer condition | |
314 | 3. Fixed a bug in the compressor and wrote a test for it | |
315 | 4. Executed full cycle for 4.4 GB file; performance can be improved ;-) | |
316 | ||
317 | ### 2015-01-11 | |
318 | ||
319 | - Release v0.2 because of the working LZMA encoder and decoder |