knowledge-database (beta)

Current group: comp.compression.

huffman coding over and over again

huffman coding over and over again  
Rich Koup
 Re: huffman coding over and over again  
Matt Mahoney
 Re: huffman coding over and over again  
Rich Koup
 Re: huffman coding over and over again  
Popai
 Re: huffman coding over and over again  
Willem
 Re: huffman coding over and over again  
Rich Koup
 Re: huffman coding over and over again  
Willem
 Re: huffman coding over and over again  
Phil Carmody
 Re: huffman coding over and over again  
David A. Scott
 Re: huffman coding over and over again  
Phil Carmody
 Re: huffman coding over and over again  
Rich Koup
 Re: huffman coding over and over again  
David A. Scott
 Re: huffman coding over and over again  
Willem
 Re: huffman coding over and over again  
Phil Carmody
 Re: huffman coding over and over again  
Rich Koup
 Re: huffman coding over and over again  
Willem
 Re: huffman coding over and over again  
cr88192
From:Rich Koup
Subject:huffman coding over and over again
Date:Sat, 22 Jan 2005 18:30:46 -0500
Is it possible to do huffman coding over and over again? Maybe try to flip
the even or odd bits between each recoding? Maybe try other transformations
between each coding to shuffle the data so it seems as if it's a new piece
of data so it can be recoded again.

When we want to decompress the data, we would do a huffman decode then
shuffle it back to the way it was, then decode again...etc...

Is it possible?



Thanks
From:Matt Mahoney
Subject:Re: huffman coding over and over again
Date:22 Jan 2005 16:14:01 -0800

Rich Koup wrote:
> Is it possible to do huffman coding over and over again? Maybe try to
flip
> the even or odd bits between each recoding? Maybe try other
transformations
> between each coding to shuffle the data so it seems as if it's a new
piece
> of data so it can be recoded again.
>
> When we want to decompress the data, we would do a huffman decode
then
> shuffle it back to the way it was, then decode again...etc...
>
> Is it possible?

Yes, but it would be too slow for practical use. That's why Huffman
codes aren't used for high order models like PPM. An arithmetic code
would be faster, besides giving better compression.

-- Matt Mahoney
From:Rich Koup
Subject:Re: huffman coding over and over again
Date:Sat, 22 Jan 2005 19:24:55 -0500
What are the compression techniques that are similar to what I described in
the orginal post called (if they have a name)? So the problem with it is the
time it takes to compress and decompress, right? So practically it is
possible, but would take a lot of time. Also, correct me if I'm wrong, did
you say that arithmetic codes compress better than the technique I
described?


Thank you



"Matt Mahoney" wrote in message
news:1106439240.979808.291000@c13g2000cwb.googlegroups.com...
>
> Rich Koup wrote:
> > Is it possible to do huffman coding over and over again? Maybe try to
> flip
> > the even or odd bits between each recoding? Maybe try other
> transformations
> > between each coding to shuffle the data so it seems as if it's a new
> piece
> > of data so it can be recoded again.
> >
> > When we want to decompress the data, we would do a huffman decode
> then
> > shuffle it back to the way it was, then decode again...etc...
> >
> > Is it possible?
>
> Yes, but it would be too slow for practical use. That's why Huffman
> codes aren't used for high order models like PPM. An arithmetic code
> would be faster, besides giving better compression.
>
> -- Matt Mahoney
>
From:Popai
Subject:Re: huffman coding over and over again
Date:23 Jan 2005 23:24:20 -0800
First of all please excuse my poor English, I'm from Romania :).

Suppose your idea works(but it doesn't). You need a method to specify
at each "recompression" step which is the transformation you want to
use (the same transformation at each step will not work for shure), and
how many "recompressions" you have done. This will add extradata or
overhead to your result (which will not be very small as you might
imagine) and will almost anulate the result of repeated compression.
Try to read something about Kolmogorov complexity and algorithmic
information theory and probably you will find more intuitive
explenations why your method doesn't works.

There is an interesting paper which explains that very well ( I mean
perfect English too :)) ). It is actually a section from "Data
Compression: The Complete Reference" by Davis Solomon, section removed
from the 3dr edition of that excelent book. You can find the paper
at:http://www.davidsalomon.name/DC3advertis/counting.arg.pdf
Best regards,
Popai
From:Willem
Subject:Re: huffman coding over and over again
Date:Sun, 23 Jan 2005 00:17:26 +0000 (UTC)
Rich wrote:
) Is it possible to do huffman coding over and over again? Maybe try to flip
) the even or odd bits between each recoding? Maybe try other transformations
) between each coding to shuffle the data so it seems as if it's a new piece
) of data so it can be recoded again.

Huffman by itself doesn't compress data. It's just an encoding.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
From:Rich Koup
Subject:Re: huffman coding over and over again
Date:Sat, 22 Jan 2005 19:25:57 -0500
Right, but a side-effect from the coding, is that it does actually compress
data when after it encodes the data in most of the cases. Right?

Thanks


"Willem" wrote in message
news:slrncv5r8m.2jsr.willem@toad.stack.nl...
> Rich wrote:
> ) Is it possible to do huffman coding over and over again? Maybe try to
flip
> ) the even or odd bits between each recoding? Maybe try other
transformations
> ) between each coding to shuffle the data so it seems as if it's a new
piece
> ) of data so it can be recoded again.
>
> Huffman by itself doesn't compress data. It's just an encoding.
>
>
> SaSW, Willem
> --
> Disclaimer: I am in no way responsible for any of the statements
> made in the above text. For all I know I might be
> drugged or something..
> No I'm not paranoid. You all think I'm paranoid, don't you !
> #EOT
From:Willem
Subject:Re: huffman coding over and over again
Date:Sun, 23 Jan 2005 01:05:06 +0000 (UTC)
Rich wrote:
) "Willem" wrote in message
) news:slrncv5r8m.2jsr.willem@toad.stack.nl...
)> Huffman by itself doesn't compress data. It's just an encoding.
)
) Right, but a side-effect from the coding, is that it does actually compress
) data when after it encodes the data in most of the cases. Right?

Huffman doesn't encode 'data'. It encodes symbols, that usually have
different probabilities. Stream of symbols with assigned probabilities
in, stream of bits out.

How are you going to make symbols-with-assigned-probabilities from that
stream of bits again ?


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
From:Phil Carmody
Subject:Re: huffman coding over and over again
Date:23 Jan 2005 14:11:43 +0200
"Rich Koup" writes:

> Right, but a side-effect from the coding, is that it does actually compress
> data when after it encodes the data in most of the cases. Right?

Assuming you've communicated in advance (maybe by standard, maybe as
a header/prefix to the message) the particular encoding used, the
_majority_ of cases will cause expansion, not positive compression.

Compression functions are really expansion functions with a small
but useful set of failure cases. The Kraft inequality explains why.

Phil
--
Excerpt from Geoff Bulter's Proscriptive Dictionary:
aaa Don't use this, there's no such word
aaaa Don't use this, there's no such word
aaaaa Don't use this, there's no such word
From:David A. Scott
Subject:Re: huffman coding over and over again
Date:23 Jan 2005 14:35:32 GMT
Phil Carmody wrote in
news:87u0p8ibuo.fsf@nonospaz.fatphil.org:

>
> Compression functions are really expansion functions with a small
> but useful set of failure cases.
>
>

That to good to be something you thought of were did you get this
gem I like it.


David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
From:Phil Carmody
Subject:Re: huffman coding over and over again
Date:23 Jan 2005 17:03:43 +0200
"David A. Scott" writes:
> Phil Carmody wrote in
> news:87u0p8ibuo.fsf@nonospaz.fatphil.org:
>
> > Compression functions are really expansion functions with a small
> > but useful set of failure cases.
>
> That to good to be something you thought of were did you get this
> gem I like it.

I first came up with it (or something very similar) a few years
ago (here on comp.compression). I don't know if it has been said
prior to that by someone else. I am quite proud of it, I must
say :-).

Phil
--
Excerpt from Geoff Bulter's Proscriptive Dictionary:
aaa Don't use this, there's no such word
aaaa Don't use this, there's no such word
aaaaa Don't use this, there's no such word
From:Rich Koup
Subject:Re: huffman coding over and over again
Date:Sun, 23 Jan 2005 11:43:02 -0500
hehehe... So the times it actually compresses are the cases where it
actually fails...

Ok, I understand now. Maybe the technique I described can be applied to the
Zip compression algorithm? Zip then do a transformation, zip again,
transform... etc...

When I say transformation, I mean flipping every nth bit.. or some other
fancy method...

What would happen then, would it work?


Thanks



"Phil Carmody" wrote in message
news:87zmz0gpbk.fsf@nonospaz.fatphil.org...
> "David A. Scott" writes:
> > Phil Carmody wrote in
> > news:87u0p8ibuo.fsf@nonospaz.fatphil.org:
> >
> > > Compression functions are really expansion functions with a small
> > > but useful set of failure cases.
> >
> > That to good to be something you thought of were did you get this
> > gem I like it.
>
> I first came up with it (or something very similar) a few years
> ago (here on comp.compression). I don't know if it has been said
> prior to that by someone else. I am quite proud of it, I must
> say :-).
>
> Phil
> --
> Excerpt from Geoff Bulter's Proscriptive Dictionary:
> aaa Don't use this, there's no such word
> aaaa Don't use this, there's no such word
> aaaaa Don't use this, there's no such word
From:David A. Scott
Subject:Re: huffman coding over and over again
Date:23 Jan 2005 16:47:29 GMT
"Rich Koup" wrote in
news:hNmdnceyM4eKSW7cRVn-vw@rogers.com:

> hehehe... So the times it actually compresses are the cases where it
> actually fails...
>
> Ok, I understand now. Maybe the technique I described can be applied
> to the Zip compression algorithm? Zip then do a transformation, zip
> again, transform... etc...
>
> When I say transformation, I mean flipping every nth bit.. or some
> other fancy method...
>
> What would happen then, would it work?
>
>
> Thanks
>
>


Depends on defination of work. The fact is if the first
compression was good then you have mostly a random looking file.
Since most random files don't compress it will most likely get
longer every pass. Did you already forget that a compressor is
nothing but an expander with a small limited set of failures that
actually compress.



David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
From:Willem
Subject:Re: huffman coding over and over again
Date:Sun, 23 Jan 2005 16:52:18 +0000 (UTC)
Rich wrote:
) hehehe... So the times it actually compresses are the cases where it
) actually fails...
)
) Ok, I understand now. Maybe the technique I described can be applied to the
) Zip compression algorithm? Zip then do a transformation, zip again,
) transform... etc...
)
) When I say transformation, I mean flipping every nth bit.. or some other
) fancy method...
)
) What would happen then, would it work?

No. Any compression program works because the data it targets has a
lot of redundancy. If it's a good compressor, it removes most of the
redundancy the first time around. Transformations like you mention
cannot make the data more redundant, only in rare cases it could
expose some hidden redundancies. But the thing with redundancy
is that it usually isn't that hidden.

Anyway, if your scheme would work on some compressor, then all that
means is that that compressor wasn't very good to begin with.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
From:Phil Carmody
Subject:Re: huffman coding over and over again
Date:23 Jan 2005 19:44:13 +0200
"Rich Koup" writes:
> hehehe... So the times it actually compresses are the cases where it
> actually fails...

If viewed with a twisted sense of humour and a thorough knowledge
of the Kraft inequality, yes. My comment is not necessarily a useful
one for teaching newbies though. (But it might be, who knows?)

> Ok, I understand now. Maybe the technique I described can be applied to the
> Zip compression algorithm? Zip then do a transformation, zip again,
> transform... etc...
>
> When I say transformation, I mean flipping every nth bit.. or some other
> fancy method...
>
> What would happen then, would it work?

If there's no _actual_ _concrete_ reason for the transformation
to make the data more compressible, then it won't do anything useful.

The most famous transformations have concrete reasons why they work.
BWT changes how predictable patterns occur, which makes them easier
to find by common modellers. The predictable patters _were_ there in
the original, it's just that they've been made easier to exploit
(for real reasons based on knowledge of the types of files that people
most commonly work with).

Just fiddling with bits on the hope that it might become compressible
is not such a transformation. It's "compression by luck", which ranks
alongside "compression by coincidence" as a futile technique. If you
don't know why you shouldn't be trying it, then you don't know enough
to be able to get it to work, so you shouldn't bother trying it. If
you know enough to know why you shouldn't be trying it, then you know
to not even bother trying it.


Phil
--
Excerpt from Geoff Bulter's Proscriptive Dictionary:
aaa Don't use this, there's no such word
aaaa Don't use this, there's no such word
aaaaa Don't use this, there's no such word
From:Rich Koup
Subject:Re: huffman coding over and over again
Date:Sun, 23 Jan 2005 15:17:50 -0500
Alright. It's based on luck. The luck that the actual transformation would
transform the compressed data to something that has redundancies and
repitions. So, here is another idea, 1 (256) or 2 (65536) bytes that could
describe either 256 or 65536 different tranforms. The program could try
which of these transforms can create the most redudancy and then act on that
transfrom and append the 1 or 2 bytes that describe which transformation
that was used into the transformed data. This offcourse is still based on
luck that any of the different kinds of transforms can create redundancy. I
also understand that this is a stupid idea because it would take tons and
tons of time to try all the different kinds of transforms..etc.. not very
practical.

I guess I should abandon this idea. :)

I learned a lot from your replies. Thank you all.



"Phil Carmody" wrote in message
news:87mzv0ghw2.fsf@nonospaz.fatphil.org...
> "Rich Koup" writes:
> > hehehe... So the times it actually compresses are the cases where it
> > actually fails...
>
> If viewed with a twisted sense of humour and a thorough knowledge
> of the Kraft inequality, yes. My comment is not necessarily a useful
> one for teaching newbies though. (But it might be, who knows?)
>
> > Ok, I understand now. Maybe the technique I described can be applied to
the
> > Zip compression algorithm? Zip then do a transformation, zip again,
> > transform... etc...
> >
> > When I say transformation, I mean flipping every nth bit.. or some other
> > fancy method...
> >
> > What would happen then, would it work?
>
> If there's no _actual_ _concrete_ reason for the transformation
> to make the data more compressible, then it won't do anything useful.
>
> The most famous transformations have concrete reasons why they work.
> BWT changes how predictable patterns occur, which makes them easier
> to find by common modellers. The predictable patters _were_ there in
> the original, it's just that they've been made easier to exploit
> (for real reasons based on knowledge of the types of files that people
> most commonly work with).
>
> Just fiddling with bits on the hope that it might become compressible
> is not such a transformation. It's "compression by luck", which ranks
> alongside "compression by coincidence" as a futile technique. If you
> don't know why you shouldn't be trying it, then you don't know enough
> to be able to get it to work, so you shouldn't bother trying it. If
> you know enough to know why you shouldn't be trying it, then you know
> to not even bother trying it.
>
>
> Phil
> --
> Excerpt from Geoff Bulter's Proscriptive Dictionary:
> aaa Don't use this, there's no such word
> aaaa Don't use this, there's no such word
> aaaaa Don't use this, there's no such word
From:Willem
Subject:Re: huffman coding over and over again
Date:Sun, 23 Jan 2005 23:58:02 +0000 (UTC)
Rich wrote:
) Alright. It's based on luck. The luck that the actual transformation would
) transform the compressed data to something that has redundancies and
) repitions. So, here is another idea, 1 (256) or 2 (65536) bytes that could
) describe either 256 or 65536 different tranforms. The program could try
) which of these transforms can create the most redudancy and then act on that
) transfrom and append the 1 or 2 bytes that describe which transformation
) that was used into the transformed data. This offcourse is still based on
) luck that any of the different kinds of transforms can create redundancy. I
) also understand that this is a stupid idea because it would take tons and
) tons of time to try all the different kinds of transforms..etc.. not very
) practical.

If you try 65536 different transforms, then the best of those will, on
average, give you a gain of at most 2 bytes.

This is a corollary of the Counting Theorem.

I suggest you find out what that theorem is, and try to understand it.
That is the very first step you should take when trying to think of a
compression technique.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
From:cr88192
Subject:Re: huffman coding over and over again
Date:Sun, 23 Jan 2005 00:27:31 GMT

"Rich Koup" wrote in message
news:19SdnSTTh7-Uf2_cRVn-tQ@rogers.com...
> Is it possible to do huffman coding over and over again? Maybe try to flip
> the even or odd bits between each recoding? Maybe try other
> transformations
> between each coding to shuffle the data so it seems as if it's a new piece
> of data so it can be recoded again.
>
> When we want to decompress the data, we would do a huffman decode then
> shuffle it back to the way it was, then decode again...etc...
>
> Is it possible?
>
don't think so.

for the input to huffman a difference in the probabilities for different
values is assumed, and this is used to encode the data.
for the output, in most cases all the values should have near-equal
probabilities (or at least not different enough to make much use of).

bit shuffling or transforms are unlikely to fix this, as the results are
still likely to be chaotic and fairly evenly distributed.


others may have other thoughts though...
   

Copyright © 2006 knowledge-database   -   All rights reserved