knowledge-database (beta)

Current group: comp.compression.

My RLE Method...

My RLE Method...  
Eric D. Brown
 Re: My RLE Method...  
cerelaz
 Re: My RLE Method...  
Jim Leonard
 Re: My RLE Method...  
cr88192
 followup (Re: My RLE Method...)  
cr88192
 Re: My RLE Method...  
Jim Leonard
 Re: My RLE Method...  
cr88192
 Re: My RLE Method...  
Matt Mahoney
 Re: My RLE Method...  
bybell at rocketmail.com
 Re: My RLE Method...  
Phil Carmody
 Re: My RLE Method...  
Jim Leonard
 Re: My RLE Method...  
cr88192
 Re: My RLE Method...  
Jim Leonard
 Re: My RLE Method...  
cr88192
From:Eric D. Brown
Subject:My RLE Method...
Date:Sun, 16 Jan 2005 18:41:30 -0600
I don't know if this had been used before, but I'd like some feedback on
how you can optimize RLE (and how optimized this is). This is an RLE
enocder that I made:

The # is a bit-place that represents the run-length size.

Original RL - Flag - RL Encoded Size (without the Flag)

1 0 N/A
2-3 10 # (So, 2 = 10+0; 3 = 10+1)
4-5 110 # (So, 4 = 110+0; 5 = 110+1)
6-9 1110 ## (So, 6 = 1110+00; 7 = 1110+01; etc.)
10-17 11110 ### (So, 10 = 11110+000; etc.)
18-33 111110 #### (So, 18 = 111110+0000; etc.)
etc.

With this method, the only original run-length size with a larger
encoded size is 2 (which encodes to 3 bits). Any original run-lengths
of 1,3,4 and 6 have an encoded size that is the same as the original
run-length size. Any other original run-length is encoded with compression.

Again, how optimized is this method?

Respectfully,
Eric D. Brown
From:cerelaz
Subject:Re: My RLE Method...
Date:Mon, 17 Jan 2005 19:33:27 GMT

> 1 0 N/A
> 2-3 10 # (So, 2 = 10+0; 3 = 10+1)
> 4-5 110 # (So, 4 = 110+0; 5 = 110+1)
> 6-9 1110 ## (So, 6 = 1110+00; 7 = 1110+01; etc.)
> 10-17 11110 ### (So, 10 = 11110+000; etc.)
> 18-33 111110 #### (So, 18 = 111110+0000; etc.)
> etc.

this is Elias Gamma (negative)
From:Jim Leonard
Subject:Re: My RLE Method...
Date:16 Jan 2005 17:48:04 -0800
Eric D. Brown wrote:
> With this method, the only original run-length size with a larger
> encoded size is 2 (which encodes to 3 bits). Any original
run-lengths
> of 1,3,4 and 6 have an encoded size that is the same as the original
> run-length size. Any other original run-length is encoded with
compression.
>
> Again, how optimized is this method?

It depends on your source data, frankly. For example, your scheme
would do poorly on data that was comprised mainly of runs of 2.

There are many ways to skin a cat, and many ways to RLE data -- you
just have to pick the best method for your data, if you know it
beforehand. For example, here's how several methods of RLE fare with
the following original data "RTAAAASDEEEEE":

control char method, runs: (* represents character that signifies run)
RT*4ASD*5E

control char method, literals: (* represents character that signifies
literal)
*R*T4A*S*D5E

repeating char method: (double-char indicates a run)
RTAA4SDEE5

MS RLE 8-bit:
02RT4A02SD5E

PDF/TIFF/Packbits (~ represents number > 127):
1RT~A1SD~E

So which is best? I couldn't decide, so for a recent project I wrote a
library that tested each one on a block of data, then used the one that
output the smallest compressed block and flagged which method achieved
it.
From:cr88192
Subject:Re: My RLE Method...
Date:Tue, 18 Jan 2005 02:08:38 GMT

"Eric D. Brown" wrote in message
news:ZTDGd.7385$263.4166@fe06.lga...
>I don't know if this had been used before, but I'd like some feedback on
>how you can optimize RLE (and how optimized this is). This is an RLE
>enocder that I made:
>
> The # is a bit-place that represents the run-length size.
>
> Original RL - Flag - RL Encoded Size (without the Flag)
>
> 1 0 N/A
> 2-3 10 # (So, 2 = 10+0; 3 = 10+1)
> 4-5 110 # (So, 4 = 110+0; 5 = 110+1)
> 6-9 1110 ## (So, 6 = 1110+00; 7 = 1110+01; etc.)
> 10-17 11110 ### (So, 10 = 11110+000; etc.)
> 18-33 111110 #### (So, 18 = 111110+0000; etc.)
> etc.
>
> With this method, the only original run-length size with a larger encoded
> size is 2 (which encodes to 3 bits). Any original run-lengths of 1,3,4
> and 6 have an encoded size that is the same as the original run-length
> size. Any other original run-length is encoded with compression.
>
> Again, how optimized is this method?
>
dunno.

it makes me think: how about adding each bit doubles the range?
0: 1
10: 2
110: 4
1110: 8
11110: 16
....

11110 10: 18 (7 bits)

effectively, each bit would cost a certain number of bits, and large or more
complex ones would cost more.

11110 110 10 0: 23 (11 bits)

which may not be acceptable.

other possibilities could include the fibonacci sequence:
0: 1
10: 2
110: 3
1110: 5
11110: 8
111110: 13
1111110: 21

111110 1110: 18 (10 bits)
1111110 10: 23 (9 bits)

or, one could try squares.

....

yes, lots of testing. one could try different sets of data, and compute the
average number of bits for each run, and try to narrow down what are "good"
ways of encoding length.


now, recently for my stuff I am considering focusing a little more on and
using (bytewise) rle and lz coding schemes, mostly under the reasoning that
they decode a lot faster than, eg, a huffman+rle or huffman+lz based algo,
even if they use a little more space...


eg: thoughts are a lz+rle bytewise encoder/decoder.
note: like my current bytewise rle encoder, the encoders would probably make
a pass over the data to choose the "magic" values, which are typically the
least likely possible (often the values are unused).

these are often included at the start of the stream to indicate which ones
they are. I am also considering the possibility of a "sanity magic".

eg:
"PDLZ"




but then again, I could be different, eg:
I could skip out on the 2 tags, and have a set of prefixed runs:
run prefix byte
lz (rid<0):
rle (rid>0):
raw (rid=0):

or, rid could be divided into 4 sets of 64.

....

so many possibilities for something so trivial, at least the algos are
pretty easy to write, some experimenting may work in determining which
approaches work well.
From:cr88192
Subject:followup (Re: My RLE Method...)
Date:Tue, 18 Jan 2005 04:17:12 GMT

"cr88192" wrote in message
news:Gc_Gd.656$rc4.503@fe07.usenetserver.com...
>
> "Eric D. Brown" wrote in message
> news:ZTDGd.7385$263.4166@fe06.lga...
>>I don't know if this had been used before, but I'd like some feedback on
>>how you can optimize RLE (and how optimized this is). This is an RLE
>>enocder that I made:
>>
>> The # is a bit-place that represents the run-length size.
>>
>> Original RL - Flag - RL Encoded Size (without the Flag)
>>
>> 1 0 N/A
>> 2-3 10 # (So, 2 = 10+0; 3 = 10+1)
>> 4-5 110 # (So, 4 = 110+0; 5 = 110+1)
>> 6-9 1110 ## (So, 6 = 1110+00; 7 = 1110+01; etc.)
>> 10-17 11110 ### (So, 10 = 11110+000; etc.)
>> 18-33 111110 #### (So, 18 = 111110+0000; etc.)
>> etc.
>>
>> With this method, the only original run-length size with a larger encoded
>> size is 2 (which encodes to 3 bits). Any original run-lengths of 1,3,4
>> and 6 have an encoded size that is the same as the original run-length
>> size. Any other original run-length is encoded with compression.
>>
>> Again, how optimized is this method?
>>
> dunno.
>

crap, should have mentioned the "stair step" (or whatever it is called)
approach.

eg: you define a step size, eg, which is used for encoding runs.
certain values are used to indicate the next step.
....

eg:
3 bits:
0-6: 000-110 (3 bits)
7-13: 111 000-111 110 (6 bits)
14-21: 111 111 000-111 111 110 (9 bits)
....

4 bits:
0-14: 0000-1110 (4 bits)
15-29: 1111 0000-1111 1110 (8 bits)
30-44: 1111 1111 0000-1111 1111 1110 (12 bits)
....

5 bits:
0-62: 00000-11110
63-125: 11111 00000-11111 11110
....

more bits, more shallow, less bits, less shallow.
which is best depends on the "steepness" of the values.

variations of this approach are, eg, used in jpeg and similar.
From:Jim Leonard
Subject:Re: My RLE Method...
Date:18 Jan 2005 08:08:00 -0800
cr88192 wrote:
> now, recently for my stuff I am considering focusing a little more on
and
> using (bytewise) rle and lz coding schemes, mostly under the
reasoning that
> they decode a lot faster than, eg, a huffman+rle or huffman+lz based
algo,
> even if they use a little more space...

Yes, I didn't clarify in my original post that all of my compression
work is specialized for low-complexity hardware that does not do
bitwise operations terribly well. As a result, I have focused mainly
on byte-aligned codes and low-complexity decompression such as RLE and
LZ77-based schemes. My time spent on compression is roughly divided
10% to assembler-optimizing the decompression routines and 90% on
increasing the effectiveness of the compressor.

It may be worth noting that, in my quest to find the fastest
decompression routine possible for my architecture, there is only one
thing faster than RLE or LZ77: Compiled data. Meaning, the output of
my compressor is not run/length data or offset/length pairs, but rather
the actual bytecode necessary to reconstruct the source data (using
methods of both RLE and LZ77). Since my architecture uses 1- or 2-byte
opcodes for most instructions, it does not expand the compressed
representation much more than byte-aligned codes would. To expand the
data, I patch the destination pointer into the bytecode, and then jump
(JMP) to its entry point. The end of the data is a subroutine return
code (RET) so that program execution is passed back to the caller. The
speed advantage of this method over that of typical code is that the
compressed data does not have to be moved into and out of registers --
it is instead literals embedded the executed code itself -- saving
memory accesses.

Anyway, I thought the above might be interesting, as this is an RLE
thread, and although RLE is used more as a preprocessor for other
compression methods, some of us still use it because we need raw
decompression speed :-)
From:cr88192
Subject:Re: My RLE Method...
Date:Tue, 18 Jan 2005 18:11:11 GMT

"Jim Leonard" wrote in message
news:1106064480.373331.120810@f14g2000cwb.googlegroups.com...
> cr88192 wrote:
>> now, recently for my stuff I am considering focusing a little more on
> and
>> using (bytewise) rle and lz coding schemes, mostly under the
> reasoning that
>> they decode a lot faster than, eg, a huffman+rle or huffman+lz based
> algo,
>> even if they use a little more space...
>
> Yes, I didn't clarify in my original post that all of my compression
> work is specialized for low-complexity hardware that does not do
> bitwise operations terribly well. As a result, I have focused mainly
> on byte-aligned codes and low-complexity decompression such as RLE and
> LZ77-based schemes. My time spent on compression is roughly divided
> 10% to assembler-optimizing the decompression routines and 90% on
> increasing the effectiveness of the compressor.
>
yes, ok.

I am on a desktop, but I am impatient and want fast decompression for some
things. fully uncompressed data is wastes and incures a delay in reading
from disk (say, 10-30MB takes a bit longer to read-in than 1MB, note: fairly
low entropy), and more compressed data incures a delay in decompression.

I am trying for fast loading if possible, rle+lz77 makes sense.


> It may be worth noting that, in my quest to find the fastest
> decompression routine possible for my architecture, there is only one
> thing faster than RLE or LZ77: Compiled data. Meaning, the output of
> my compressor is not run/length data or offset/length pairs, but rather
> the actual bytecode necessary to reconstruct the source data (using
> methods of both RLE and LZ77). Since my architecture uses 1- or 2-byte
> opcodes for most instructions, it does not expand the compressed
> representation much more than byte-aligned codes would. To expand the
> data, I patch the destination pointer into the bytecode, and then jump
> (JMP) to its entry point. The end of the data is a subroutine return
> code (RET) so that program execution is passed back to the caller. The
> speed advantage of this method over that of typical code is that the
> compressed data does not have to be moved into and out of registers --
> it is instead literals embedded the executed code itself -- saving
> memory accesses.
>
not so good for pc's...
also makes the algo not so general.

> Anyway, I thought the above might be interesting, as this is an RLE
> thread, and although RLE is used more as a preprocessor for other
> compression methods, some of us still use it because we need raw
> decompression speed :-)
>
yes.

I use a hybrid, as afaik lz would generate less-than-optimal coding of runs
afaik. why have a lot of copying when I could have a fill?...

I skipped out on the header though.
basically I just give the 2 tags:

followed by the encoded data.

rle-tag len val
lz-tag len offs-low offs-high

or such...
From:Matt Mahoney
Subject:Re: My RLE Method...
Date:16 Jan 2005 17:12:59 -0800
Eric D. Brown wrote:
> I don't know if this had been used before, but I'd like some feedback
on
> how you can optimize RLE (and how optimized this is). This is an RLE

> enocder that I made:
>
> The # is a bit-place that represents the run-length size.
>
> Original RL - Flag - RL Encoded Size (without the Flag)
>
> 1 0 N/A
> 2-3 10 # (So, 2 = 10+0; 3 = 10+1)
> 4-5 110 # (So, 4 = 110+0; 5 = 110+1)
> 6-9 1110 ## (So, 6 = 1110+00; 7 = 1110+01; etc.)
> 10-17 11110 ### (So, 10 = 11110+000; etc.)
> 18-33 111110 #### (So, 18 = 111110+0000; etc.)
> etc.
>
> With this method, the only original run-length size with a larger
> encoded size is 2 (which encodes to 3 bits). Any original
run-lengths
> of 1,3,4 and 6 have an encoded size that is the same as the original
> run-length size. Any other original run-length is encoded with
compression.
>
> Again, how optimized is this method?
>
> Respectfully,
> Eric D. Brown

It looks reasonable, but it depends on how run lengths are distributed
in your data. When you assign a code of length n bits, you are
implicitly assuming its probability is 2^-n.

-- Matt Mahoney
From:bybell at rocketmail.com
Subject:Re: My RLE Method...
Date:19 Jan 2005 17:29:37 -0800
Jim Leonard wrote:

> Yes, of course. But keep in mind this is a specialized application
> (60Hz display of animation data) and the method was derived because I
> needed the absolute fastest decompression speed. It was never meant
to
> be adapted for modern machines (other than maintaining compatibility)
> or non-x86 hardware. If portability is really a concern, a
transcoder
> could easily find the machine code patterns and recode the
run/length,
> literal, and offset/length pairs into a more generic and/or optimal
> code for storage/transmission.

Fun, google groups ate my message by asking to sign in I previewed
it...

Anyway, you might want to try multiple compression types per frame:

1) straight RLE
2) RLE of the XOR diff of two adjacent frames. Instead of doing the
read->modify->write for a run of zero XORs you increment the dest
pointer.

....then pick the compression type per frame which gives you the best
speed.

-t
From:Phil Carmody
Subject:Re: My RLE Method...
Date:18 Jan 2005 11:05:22 +0200
"Eric D. Brown" writes:
> I don't know if this had been used before, but I'd like some feedback
> on how you can optimize RLE (and how optimized this is). This is an
> RLE enocder that I made:
>
> The # is a bit-place that represents the run-length size.
>
> Original RL - Flag - RL Encoded Size (without the Flag)
>
> 1 0 N/A
> 2-3 10 # (So, 2 = 10+0; 3 = 10+1)
> 4-5 110 # (So, 4 = 110+0; 5 = 110+1)
> 6-9 1110 ## (So, 6 = 1110+00; 7 = 1110+01; etc.)
> 10-17 11110 ### (So, 10 = 11110+000; etc.)
> 18-33 111110 #### (So, 18 = 111110+0000; etc.)
> etc.
>
> With this method, the only original run-length size with a larger
> encoded size is 2 (which encodes to 3 bits). Any original run-lengths
> of 1,3,4 and 6 have an encoded size that is the same as the original
> run-length size. Any other original run-length is encoded with
> compression.
>
> Again, how optimized is this method?

Plot the -ve of log (base 2, conventionally) of the probability of
symbol (i.e. run length) n against the length of its code.

Now compare that graph with similar ones for other universal codes.
(Elial, Fibinacci, Golomb, etc. etc.)

Whichever is closest to a straight line x=y will be the coding scheme
that is best for your data.


Questions to c.c - where was that excellent page that documented
almost all common universal codes? It was maintained by a c.c.
semi-regular, IIRC? Google can't find it (or I can't, should I say).

Phil
--
The answer to life's mystery is simple and direct:
Sex and death. -- Ian 'Lemmy' Kilminster.
From:Jim Leonard
Subject:Re: My RLE Method...
Date:18 Jan 2005 15:48:36 -0800
cr88192 wrote:
> > It may be worth noting that, in my quest to find the fastest
> > decompression routine possible for my architecture, there is only
one
> > thing faster than RLE or LZ77: Compiled data. Meaning, the output
of
> > my compressor is not run/length data or offset/length pairs, but
rather
> > the actual bytecode necessary to reconstruct the source data (using
> > methods of both RLE and LZ77). Since my architecture uses 1- or
2-byte
> > opcodes for most instructions, it does not expand the compressed
> > representation much more than byte-aligned codes would. To expand
the
> > data, I patch the destination pointer into the bytecode, and then
jump
> > (JMP) to its entry point. The end of the data is a subroutine
return
> > code (RET) so that program execution is passed back to the caller.
The
> > speed advantage of this method over that of typical code is that
the
> > compressed data does not have to be moved into and out of registers
--
> > it is instead literals embedded the executed code itself -- saving
> > memory accesses.
> >
> not so good for pc's...

Actually, my target *is* a PC -- a 4.77MHz 8088 IBM Model 5150 from
1982, to be exact. But I know what you meant :-)

> also makes the algo not so general.

Actually, it still is, because the output of the compressor (really a
"compiler") takes one of the following two forms (assume DS:SI and
ES:DI are already set up for illustration):

# repeating a value
MOV AX,[word value to repeat]
MOV CX,[repeat count, multiple of 2]
REP STOSW

or:

# copying previously-seen data
MOV CX,[repeat count, multiple of 2]
REP MOVSW

or:

# outputting a literal
MOV [DI],[word literal]

Since the sequences are always of these forms, it is easily possible to
just parse the machine code as a "flag" and run the data through a
generic decompressor.
From:cr88192
Subject:Re: My RLE Method...
Date:Wed, 19 Jan 2005 03:26:17 GMT

"Jim Leonard" wrote in message
news:1106092116.504496.122120@c13g2000cwb.googlegroups.com...
> cr88192 wrote:
>> > It may be worth noting that, in my quest to find the fastest
>> > decompression routine possible for my architecture, there is only
> one
>> > thing faster than RLE or LZ77: Compiled data. Meaning, the output
> of
>> > my compressor is not run/length data or offset/length pairs, but
> rather
>> > the actual bytecode necessary to reconstruct the source data (using
>> > methods of both RLE and LZ77). Since my architecture uses 1- or
> 2-byte
>> > opcodes for most instructions, it does not expand the compressed
>> > representation much more than byte-aligned codes would. To expand
> the
>> > data, I patch the destination pointer into the bytecode, and then
> jump
>> > (JMP) to its entry point. The end of the data is a subroutine
> return
>> > code (RET) so that program execution is passed back to the caller.
> The
>> > speed advantage of this method over that of typical code is that
> the
>> > compressed data does not have to be moved into and out of registers
> --
>> > it is instead literals embedded the executed code itself -- saving
>> > memory accesses.
>> >
>> not so good for pc's...
>
> Actually, my target *is* a PC -- a 4.77MHz 8088 IBM Model 5150 from
> 1982, to be exact. But I know what you meant :-)
>
oh, ok...

>> also makes the algo not so general.
>
> Actually, it still is, because the output of the compressor (really a
> "compiler") takes one of the following two forms (assume DS:SI and
> ES:DI are already set up for illustration):
>
> # repeating a value
> MOV AX,[word value to repeat]
> MOV CX,[repeat count, multiple of 2]
> REP STOSW
>
> or:
>
> # copying previously-seen data
> MOV CX,[repeat count, multiple of 2]
> REP MOVSW
>
> or:
>
> # outputting a literal
> MOV [DI],[word literal]
>
> Since the sequences are always of these forms, it is easily possible to
> just parse the machine code as a "flag" and run the data through a
> generic decompressor.
>
ok.

I still don't like the idea so much as "normal" data compression though,
especially since it would not be really useful on something other than an
8088. the extra processing might make it otherwise slower than a normal
rle/lz77 decoder on a modern pc.

or such.

I have classes that start today...
From:Jim Leonard
Subject:Re: My RLE Method...
Date:19 Jan 2005 10:00:22 -0800
cr88192 wrote:
> I still don't like the idea so much as "normal" data compression
though,
> especially since it would not be really useful on something other
than an
> 8088. the extra processing might make it otherwise slower than a
normal
> rle/lz77 decoder on a modern pc.

Yes, of course. But keep in mind this is a specialized application
(60Hz display of animation data) and the method was derived because I
needed the absolute fastest decompression speed. It was never meant to
be adapted for modern machines (other than maintaining compatibility)
or non-x86 hardware. If portability is really a concern, a transcoder
could easily find the machine code patterns and recode the run/length,
literal, and offset/length pairs into a more generic and/or optimal
code for storage/transmission.

Anyway, it's nice to have an opportunity to chime in here now and then.
With heavyweights like Matt Mahoney and Malcom Taylor and others
dropping in, I feel quite outclassed most of the time :-) Most of the
discussion here is on improving compression performance (as opposed to
execution performance) so my ears perk up whenever someone mentions RLE
or LZ77.
From:cr88192
Subject:Re: My RLE Method...
Date:Thu, 20 Jan 2005 01:12:08 GMT

"Jim Leonard" wrote in message
news:1106157622.444332.128500@z14g2000cwz.googlegroups.com...
> cr88192 wrote:
>> I still don't like the idea so much as "normal" data compression
> though,
>> especially since it would not be really useful on something other
> than an
>> 8088. the extra processing might make it otherwise slower than a
> normal
>> rle/lz77 decoder on a modern pc.
>
> Yes, of course. But keep in mind this is a specialized application
> (60Hz display of animation data) and the method was derived because I
> needed the absolute fastest decompression speed. It was never meant to
> be adapted for modern machines (other than maintaining compatibility)
> or non-x86 hardware. If portability is really a concern, a transcoder
> could easily find the machine code patterns and recode the run/length,
> literal, and offset/length pairs into a more generic and/or optimal
> code for storage/transmission.
>
yes, makes sense.

my motivation is more trying to get fast loading of resources from a
"cache". eg: I load input data and build cached versions, for which later it
should be possible to load and decompress quickly to avoid waiting all that
long.

a lot of the data that is compressed is things like lightmaps, generated
graphics, visability bitmaps, misc other data, ... eg: stuff that tends to
take up a lot of extra space.

> Anyway, it's nice to have an opportunity to chime in here now and then.
> With heavyweights like Matt Mahoney and Malcom Taylor and others
> dropping in, I feel quite outclassed most of the time :-) Most of the
> discussion here is on improving compression performance (as opposed to
> execution performance) so my ears perk up whenever someone mentions RLE
> or LZ77.
>
yes, speed is a relevant issue imo.

eg, was doing arithmatic coding at one point, and it was fairly slow and I
could do little about it without stepping into patented-land.
so, I started using a huffman like coding, which was tolerably fast for most
things. for these I had written some rle and lz encoders, however, these had
assumed a 12 bit coding space (using shorts for input/output with the
entropy coder).

I thus started as a base using one of my somewhat older rle coders, and
taking the lz77 stuff from the one I was using for entropy coding and
modifying it to work in the new function (necessary because the 2 had used
the vars differently).

alas, given the start of college classes I haven't really had time to test
it yet (among other things I have been meaning to mess with).

sometime I was also considering getting around to messing with lz78, but
haven't yet...
   

Copyright © 2006 knowledge-database   -   All rights reserved