|
|
 | | From: | Eric D. Brown | | Subject: | My RLE Method... | | Date: | Sun, 16 Jan 2005 18:41:30 -0600 |
|
|
 | I don't know if this had been used before, but I'd like some feedback on how you can optimize RLE (and how optimized this is). This is an RLE enocder that I made:
The # is a bit-place that represents the run-length size.
Original RL - Flag - RL Encoded Size (without the Flag)
1 0 N/A 2-3 10 # (So, 2 = 10+0; 3 = 10+1) 4-5 110 # (So, 4 = 110+0; 5 = 110+1) 6-9 1110 ## (So, 6 = 1110+00; 7 = 1110+01; etc.) 10-17 11110 ### (So, 10 = 11110+000; etc.) 18-33 111110 #### (So, 18 = 111110+0000; etc.) etc.
With this method, the only original run-length size with a larger encoded size is 2 (which encodes to 3 bits). Any original run-lengths of 1,3,4 and 6 have an encoded size that is the same as the original run-length size. Any other original run-length is encoded with compression.
Again, how optimized is this method?
Respectfully, Eric D. Brown
|
|
 | | From: | cerelaz | | Subject: | Re: My RLE Method... | | Date: | Mon, 17 Jan 2005 19:33:27 GMT |
|
|
 | > 1 0 N/A > 2-3 10 # (So, 2 = 10+0; 3 = 10+1) > 4-5 110 # (So, 4 = 110+0; 5 = 110+1) > 6-9 1110 ## (So, 6 = 1110+00; 7 = 1110+01; etc.) > 10-17 11110 ### (So, 10 = 11110+000; etc.) > 18-33 111110 #### (So, 18 = 111110+0000; etc.) > etc.
this is Elias Gamma (negative)
|
|
 | | From: | Jim Leonard | | Subject: | Re: My RLE Method... | | Date: | 16 Jan 2005 17:48:04 -0800 |
|
|
 | Eric D. Brown wrote: > With this method, the only original run-length size with a larger > encoded size is 2 (which encodes to 3 bits). Any original run-lengths > of 1,3,4 and 6 have an encoded size that is the same as the original > run-length size. Any other original run-length is encoded with compression. > > Again, how optimized is this method?
It depends on your source data, frankly. For example, your scheme would do poorly on data that was comprised mainly of runs of 2.
There are many ways to skin a cat, and many ways to RLE data -- you just have to pick the best method for your data, if you know it beforehand. For example, here's how several methods of RLE fare with the following original data "RTAAAASDEEEEE":
control char method, runs: (* represents character that signifies run) RT*4ASD*5E
control char method, literals: (* represents character that signifies literal) *R*T4A*S*D5E
repeating char method: (double-char indicates a run) RTAA4SDEE5
MS RLE 8-bit: 02RT4A02SD5E
PDF/TIFF/Packbits (~ represents number > 127): 1RT~A1SD~E
So which is best? I couldn't decide, so for a recent project I wrote a library that tested each one on a block of data, then used the one that output the smallest compressed block and flagged which method achieved it.
|
|
 | | From: | cr88192 | | Subject: | Re: My RLE Method... | | Date: | Tue, 18 Jan 2005 02:08:38 GMT |
|
|
 | "Eric D. Brown" wrote in message news:ZTDGd.7385$263.4166@fe06.lga... >I don't know if this had been used before, but I'd like some feedback on >how you can optimize RLE (and how optimized this is). This is an RLE >enocder that I made: > > The # is a bit-place that represents the run-length size. > > Original RL - Flag - RL Encoded Size (without the Flag) > > 1 0 N/A > 2-3 10 # (So, 2 = 10+0; 3 = 10+1) > 4-5 110 # (So, 4 = 110+0; 5 = 110+1) > 6-9 1110 ## (So, 6 = 1110+00; 7 = 1110+01; etc.) > 10-17 11110 ### (So, 10 = 11110+000; etc.) > 18-33 111110 #### (So, 18 = 111110+0000; etc.) > etc. > > With this method, the only original run-length size with a larger encoded > size is 2 (which encodes to 3 bits). Any original run-lengths of 1,3,4 > and 6 have an encoded size that is the same as the original run-length > size. Any other original run-length is encoded with compression. > > Again, how optimized is this method? > dunno.
it makes me think: how about adding each bit doubles the range? 0: 1 10: 2 110: 4 1110: 8 11110: 16 ....
11110 10: 18 (7 bits)
effectively, each bit would cost a certain number of bits, and large or more complex ones would cost more.
11110 110 10 0: 23 (11 bits)
which may not be acceptable.
other possibilities could include the fibonacci sequence: 0: 1 10: 2 110: 3 1110: 5 11110: 8 111110: 13 1111110: 21
111110 1110: 18 (10 bits) 1111110 10: 23 (9 bits)
or, one could try squares.
....
yes, lots of testing. one could try different sets of data, and compute the average number of bits for each run, and try to narrow down what are "good" ways of encoding length.
now, recently for my stuff I am considering focusing a little more on and using (bytewise) rle and lz coding schemes, mostly under the reasoning that they decode a lot faster than, eg, a huffman+rle or huffman+lz based algo, even if they use a little more space...
eg: thoughts are a lz+rle bytewise encoder/decoder. note: like my current bytewise rle encoder, the encoders would probably make a pass over the data to choose the "magic" values, which are typically the least likely possible (often the values are unused).
these are often included at the start of the stream to indicate which ones they are. I am also considering the possibility of a "sanity magic".
eg: "PDLZ"
but then again, I could be different, eg: I could skip out on the 2 tags, and have a set of prefixed runs: run prefix byte lz (rid<0): rle (rid>0): raw (rid=0):
or, rid could be divided into 4 sets of 64.
....
so many possibilities for something so trivial, at least the algos are pretty easy to write, some experimenting may work in determining which approaches work well.
|
|
 | | From: | cr88192 | | Subject: | followup (Re: My RLE Method...) | | Date: | Tue, 18 Jan 2005 04:17:12 GMT |
|
|
 | "cr88192" wrote in message news:Gc_Gd.656$rc4.503@fe07.usenetserver.com... > > "Eric D. Brown" wrote in message > news:ZTDGd.7385$263.4166@fe06.lga... >>I don't know if this had been used before, but I'd like some feedback on >>how you can optimize RLE (and how optimized this is). This is an RLE >>enocder that I made: >> >> The # is a bit-place that represents the run-length size. >> >> Original RL - Flag - RL Encoded Size (without the Flag) >> >> 1 0 N/A >> 2-3 10 # (So, 2 = 10+0; 3 = 10+1) >> 4-5 110 # (So, 4 = 110+0; 5 = 110+1) >> 6-9 1110 ## (So, 6 = 1110+00; 7 = 1110+01; etc.) >> 10-17 11110 ### (So, 10 = 11110+000; etc.) >> 18-33 111110 #### (So, 18 = 111110+0000; etc.) >> etc. >> >> With this method, the only original run-length size with a larger encoded >> size is 2 (which encodes to 3 bits). Any original run-lengths of 1,3,4 >> and 6 have an encoded size that is the same as the original run-length >> size. Any other original run-length is encoded with compression. >> >> Again, how optimized is this method? >> > dunno. >
crap, should have mentioned the "stair step" (or whatever it is called) approach.
eg: you define a step size, eg, which is used for encoding runs. certain values are used to indicate the next step. ....
eg: 3 bits: 0-6: 000-110 (3 bits) 7-13: 111 000-111 110 (6 bits) 14-21: 111 111 000-111 111 110 (9 bits) ....
4 bits: 0-14: 0000-1110 (4 bits) 15-29: 1111 0000-1111 1110 (8 bits) 30-44: 1111 1111 0000-1111 1111 1110 (12 bits) ....
5 bits: 0-62: 00000-11110 63-125: 11111 00000-11111 11110 ....
more bits, more shallow, less bits, less shallow. which is best depends on the "steepness" of the values.
variations of this approach are, eg, used in jpeg and similar.
|
|
 | | From: | Jim Leonard | | Subject: | Re: My RLE Method... | | Date: | 18 Jan 2005 08:08:00 -0800 |
|
|
 | cr88192 wrote: > now, recently for my stuff I am considering focusing a little more on and > using (bytewise) rle and lz coding schemes, mostly under the reasoning that > they decode a lot faster than, eg, a huffman+rle or huffman+lz based algo, > even if they use a little more space...
Yes, I didn't clarify in my original post that all of my compression work is specialized for low-complexity hardware that does not do bitwise operations terribly well. As a result, I have focused mainly on byte-aligned codes and low-complexity decompression such as RLE and LZ77-based schemes. My time spent on compression is roughly divided 10% to assembler-optimizing the decompression routines and 90% on increasing the effectiveness of the compressor.
It may be worth noting that, in my quest to find the fastest decompression routine possible for my architecture, there is only one thing faster than RLE or LZ77: Compiled data. Meaning, the output of my compressor is not run/length data or offset/length pairs, but rather the actual bytecode necessary to reconstruct the source data (using methods of both RLE and LZ77). Since my architecture uses 1- or 2-byte opcodes for most instructions, it does not expand the compressed representation much more than byte-aligned codes would. To expand the data, I patch the destination pointer into the bytecode, and then jump (JMP) to its entry point. The end of the data is a subroutine return code (RET) so that program execution is passed back to the caller. The speed advantage of this method over that of typical code is that the compressed data does not have to be moved into and out of registers -- it is instead literals embedded the executed code itself -- saving memory accesses.
Anyway, I thought the above might be interesting, as this is an RLE thread, and although RLE is used more as a preprocessor for other compression methods, some of us still use it because we need raw decompression speed :-)
|
|
 | | From: | cr88192 | | Subject: | Re: My RLE Method... | | Date: | Tue, 18 Jan 2005 18:11:11 GMT |
|
|
 | "Jim Leonard" wrote in message news:1106064480.373331.120810@f14g2000cwb.googlegroups.com... > cr88192 wrote: >> now, recently for my stuff I am considering focusing a little more on > and >> using (bytewise) rle and lz coding schemes, mostly under the > reasoning that >> they decode a lot faster than, eg, a huffman+rle or huffman+lz based > algo, >> even if they use a little more space... > > Yes, I didn't clarify in my original post that all of my compression > work is specialized for low-complexity hardware that does not do > bitwise operations terribly well. As a result, I have focused mainly > on byte-aligned codes and low-complexity decompression such as RLE and > LZ77-based schemes. My time spent on compression is roughly divided > 10% to assembler-optimizing the decompression routines and 90% on > increasing the effectiveness of the compressor. > yes, ok.
I am on a desktop, but I am impatient and want fast decompression for some things. fully uncompressed data is wastes and incures a delay in reading from disk (say, 10-30MB takes a bit longer to read-in than 1MB, note: fairly low entropy), and more compressed data incures a delay in decompression.
I am trying for fast loading if possible, rle+lz77 makes sense.
> It may be worth noting that, in my quest to find the fastest > decompression routine possible for my architecture, there is only one > thing faster than RLE or LZ77: Compiled data. Meaning, the output of > my compressor is not run/length data or offset/length pairs, but rather > the actual bytecode necessary to reconstruct the source data (using > methods of both RLE and LZ77). Since my architecture uses 1- or 2-byte > opcodes for most instructions, it does not expand the compressed > representation much more than byte-aligned codes would. To expand the > data, I patch the destination pointer into the bytecode, and then jump > (JMP) to its entry point. The end of the data is a subroutine return > code (RET) so that program execution is passed back to the caller. The > speed advantage of this method over that of typical code is that the > compressed data does not have to be moved into and out of registers -- > it is instead literals embedded the executed code itself -- saving > memory accesses. > not so good for pc's... also makes the algo not so general.
> Anyway, I thought the above might be interesting, as this is an RLE > thread, and although RLE is used more as a preprocessor for other > compression methods, some of us still use it because we need raw > decompression speed :-) > yes.
I use a hybrid, as afaik lz would generate less-than-optimal coding of runs afaik. why have a lot of copying when I could have a fill?...
I skipped out on the header though. basically I just give the 2 tags: followed by the encoded data.
rle-tag len val lz-tag len offs-low offs-high
or such...
|
|
 | | From: | Matt Mahoney | | Subject: | Re: My RLE Method... | | Date: | 16 Jan 2005 17:12:59 -0800 |
|
|
 | Eric D. Brown wrote: > I don't know if this had been used before, but I'd like some feedback on > how you can optimize RLE (and how optimized this is). This is an RLE
> enocder that I made: > > The # is a bit-place that represents the run-length size. > > Original RL - Flag - RL Encoded Size (without the Flag) > > 1 0 N/A > 2-3 10 # (So, 2 = 10+0; 3 = 10+1) > 4-5 110 # (So, 4 = 110+0; 5 = 110+1) > 6-9 1110 ## (So, 6 = 1110+00; 7 = 1110+01; etc.) > 10-17 11110 ### (So, 10 = 11110+000; etc.) > 18-33 111110 #### (So, 18 = 111110+0000; etc.) > etc. > > With this method, the only original run-length size with a larger > encoded size is 2 (which encodes to 3 bits). Any original run-lengths > of 1,3,4 and 6 have an encoded size that is the same as the original > run-length size. Any other original run-length is encoded with compression. > > Again, how optimized is this method? > > Respectfully, > Eric D. Brown
It looks reasonable, but it depends on how run lengths are distributed in your data. When you assign a code of length n bits, you are implicitly assuming its probability is 2^-n.
-- Matt Mahoney
|
|
 | | From: | bybell at rocketmail.com | | Subject: | Re: My RLE Method... | | Date: | 19 Jan 2005 17:29:37 -0800 |
|
|
 | Jim Leonard wrote:
> Yes, of course. But keep in mind this is a specialized application > (60Hz display of animation data) and the method was derived because I > needed the absolute fastest decompression speed. It was never meant to > be adapted for modern machines (other than maintaining compatibility) > or non-x86 hardware. If portability is really a concern, a transcoder > could easily find the machine code patterns and recode the run/length, > literal, and offset/length pairs into a more generic and/or optimal > code for storage/transmission.
Fun, google groups ate my message by asking to sign in I previewed it...
Anyway, you might want to try multiple compression types per frame:
1) straight RLE 2) RLE of the XOR diff of two adjacent frames. Instead of doing the read->modify->write for a run of zero XORs you increment the dest pointer.
....then pick the compression type per frame which gives you the best speed.
-t
|
|
 | | From: | Phil Carmody | | Subject: | Re: My RLE Method... | | Date: | 18 Jan 2005 11:05:22 +0200 |
|
|
 | "Eric D. Brown" writes: > I don't know if this had been used before, but I'd like some feedback > on how you can optimize RLE (and how optimized this is). This is an > RLE enocder that I made: > > The # is a bit-place that represents the run-length size. > > Original RL - Flag - RL Encoded Size (without the Flag) > > 1 0 N/A > 2-3 10 # (So, 2 = 10+0; 3 = 10+1) > 4-5 110 # (So, 4 = 110+0; 5 = 110+1) > 6-9 1110 ## (So, 6 = 1110+00; 7 = 1110+01; etc.) > 10-17 11110 ### (So, 10 = 11110+000; etc.) > 18-33 111110 #### (So, 18 = 111110+0000; etc.) > etc. > > With this method, the only original run-length size with a larger > encoded size is 2 (which encodes to 3 bits). Any original run-lengths > of 1,3,4 and 6 have an encoded size that is the same as the original > run-length size. Any other original run-length is encoded with > compression. > > Again, how optimized is this method?
Plot the -ve of log (base 2, conventionally) of the probability of symbol (i.e. run length) n against the length of its code.
Now compare that graph with similar ones for other universal codes. (Elial, Fibinacci, Golomb, etc. etc.)
Whichever is closest to a straight line x=y will be the coding scheme that is best for your data.
Questions to c.c - where was that excellent page that documented almost all common universal codes? It was maintained by a c.c. semi-regular, IIRC? Google can't find it (or I can't, should I say).
Phil -- The answer to life's mystery is simple and direct: Sex and death. -- Ian 'Lemmy' Kilminster.
|
|
 | | From: | Jim Leonard | | Subject: | Re: My RLE Method... | | Date: | 18 Jan 2005 15:48:36 -0800 |
|
|
 | cr88192 wrote: > > It may be worth noting that, in my quest to find the fastest > > decompression routine possible for my architecture, there is only one > > thing faster than RLE or LZ77: Compiled data. Meaning, the output of > > my compressor is not run/length data or offset/length pairs, but rather > > the actual bytecode necessary to reconstruct the source data (using > > methods of both RLE and LZ77). Since my architecture uses 1- or 2-byte > > opcodes for most instructions, it does not expand the compressed > > representation much more than byte-aligned codes would. To expand the > > data, I patch the destination pointer into the bytecode, and then jump > > (JMP) to its entry point. The end of the data is a subroutine return > > code (RET) so that program execution is passed back to the caller. The > > speed advantage of this method over that of typical code is that the > > compressed data does not have to be moved into and out of registers -- > > it is instead literals embedded the executed code itself -- saving > > memory accesses. > > > not so good for pc's...
Actually, my target *is* a PC -- a 4.77MHz 8088 IBM Model 5150 from 1982, to be exact. But I know what you meant :-)
> also makes the algo not so general.
Actually, it still is, because the output of the compressor (really a "compiler") takes one of the following two forms (assume DS:SI and ES:DI are already set up for illustration):
# repeating a value MOV AX,[word value to repeat] MOV CX,[repeat count, multiple of 2] REP STOSW
or:
# copying previously-seen data MOV CX,[repeat count, multiple of 2] REP MOVSW
or:
# outputting a literal MOV [DI],[word literal]
Since the sequences are always of these forms, it is easily possible to just parse the machine code as a "flag" and run the data through a generic decompressor.
|
|
 | | From: | cr88192 | | Subject: | Re: My RLE Method... | | Date: | Wed, 19 Jan 2005 03:26:17 GMT |
|
|
 | "Jim Leonard" wrote in message news:1106092116.504496.122120@c13g2000cwb.googlegroups.com... > cr88192 wrote: >> > It may be worth noting that, in my quest to find the fastest >> > decompression routine possible for my architecture, there is only > one >> > thing faster than RLE or LZ77: Compiled data. Meaning, the output > of >> > my compressor is not run/length data or offset/length pairs, but > rather >> > the actual bytecode necessary to reconstruct the source data (using >> > methods of both RLE and LZ77). Since my architecture uses 1- or > 2-byte >> > opcodes for most instructions, it does not expand the compressed >> > representation much more than byte-aligned codes would. To expand > the >> > data, I patch the destination pointer into the bytecode, and then > jump >> > (JMP) to its entry point. The end of the data is a subroutine > return >> > code (RET) so that program execution is passed back to the caller. > The >> > speed advantage of this method over that of typical code is that > the >> > compressed data does not have to be moved into and out of registers > -- >> > it is instead literals embedded the executed code itself -- saving >> > memory accesses. >> > >> not so good for pc's... > > Actually, my target *is* a PC -- a 4.77MHz 8088 IBM Model 5150 from > 1982, to be exact. But I know what you meant :-) > oh, ok...
>> also makes the algo not so general. > > Actually, it still is, because the output of the compressor (really a > "compiler") takes one of the following two forms (assume DS:SI and > ES:DI are already set up for illustration): > > # repeating a value > MOV AX,[word value to repeat] > MOV CX,[repeat count, multiple of 2] > REP STOSW > > or: > > # copying previously-seen data > MOV CX,[repeat count, multiple of 2] > REP MOVSW > > or: > > # outputting a literal > MOV [DI],[word literal] > > Since the sequences are always of these forms, it is easily possible to > just parse the machine code as a "flag" and run the data through a > generic decompressor. > ok.
I still don't like the idea so much as "normal" data compression though, especially since it would not be really useful on something other than an 8088. the extra processing might make it otherwise slower than a normal rle/lz77 decoder on a modern pc.
or such.
I have classes that start today...
|
|
 | | From: | Jim Leonard | | Subject: | Re: My RLE Method... | | Date: | 19 Jan 2005 10:00:22 -0800 |
|
|
 | cr88192 wrote: > I still don't like the idea so much as "normal" data compression though, > especially since it would not be really useful on something other than an > 8088. the extra processing might make it otherwise slower than a normal > rle/lz77 decoder on a modern pc.
Yes, of course. But keep in mind this is a specialized application (60Hz display of animation data) and the method was derived because I needed the absolute fastest decompression speed. It was never meant to be adapted for modern machines (other than maintaining compatibility) or non-x86 hardware. If portability is really a concern, a transcoder could easily find the machine code patterns and recode the run/length, literal, and offset/length pairs into a more generic and/or optimal code for storage/transmission.
Anyway, it's nice to have an opportunity to chime in here now and then. With heavyweights like Matt Mahoney and Malcom Taylor and others dropping in, I feel quite outclassed most of the time :-) Most of the discussion here is on improving compression performance (as opposed to execution performance) so my ears perk up whenever someone mentions RLE or LZ77.
|
|
 | | From: | cr88192 | | Subject: | Re: My RLE Method... | | Date: | Thu, 20 Jan 2005 01:12:08 GMT |
|
|
 | "Jim Leonard" wrote in message news:1106157622.444332.128500@z14g2000cwz.googlegroups.com... > cr88192 wrote: >> I still don't like the idea so much as "normal" data compression > though, >> especially since it would not be really useful on something other > than an >> 8088. the extra processing might make it otherwise slower than a > normal >> rle/lz77 decoder on a modern pc. > > Yes, of course. But keep in mind this is a specialized application > (60Hz display of animation data) and the method was derived because I > needed the absolute fastest decompression speed. It was never meant to > be adapted for modern machines (other than maintaining compatibility) > or non-x86 hardware. If portability is really a concern, a transcoder > could easily find the machine code patterns and recode the run/length, > literal, and offset/length pairs into a more generic and/or optimal > code for storage/transmission. > yes, makes sense.
my motivation is more trying to get fast loading of resources from a "cache". eg: I load input data and build cached versions, for which later it should be possible to load and decompress quickly to avoid waiting all that long.
a lot of the data that is compressed is things like lightmaps, generated graphics, visability bitmaps, misc other data, ... eg: stuff that tends to take up a lot of extra space.
> Anyway, it's nice to have an opportunity to chime in here now and then. > With heavyweights like Matt Mahoney and Malcom Taylor and others > dropping in, I feel quite outclassed most of the time :-) Most of the > discussion here is on improving compression performance (as opposed to > execution performance) so my ears perk up whenever someone mentions RLE > or LZ77. > yes, speed is a relevant issue imo.
eg, was doing arithmatic coding at one point, and it was fairly slow and I could do little about it without stepping into patented-land. so, I started using a huffman like coding, which was tolerably fast for most things. for these I had written some rle and lz encoders, however, these had assumed a 12 bit coding space (using shorts for input/output with the entropy coder).
I thus started as a base using one of my somewhat older rle coders, and taking the lz77 stuff from the one I was using for entropy coding and modifying it to work in the new function (necessary because the 2 had used the vars differently).
alas, given the start of college classes I haven't really had time to test it yet (among other things I have been meaning to mess with).
sometime I was also considering getting around to messing with lz78, but haven't yet...
|
|
|