|
|
 | | From: | Rich Koup | | Subject: | huffman coding over and over again | | Date: | Sat, 22 Jan 2005 18:30:46 -0500 |
|
|
 | Is it possible to do huffman coding over and over again? Maybe try to flip the even or odd bits between each recoding? Maybe try other transformations between each coding to shuffle the data so it seems as if it's a new piece of data so it can be recoded again.
When we want to decompress the data, we would do a huffman decode then shuffle it back to the way it was, then decode again...etc...
Is it possible?
Thanks
|
|
 | | From: | Matt Mahoney | | Subject: | Re: huffman coding over and over again | | Date: | 22 Jan 2005 16:14:01 -0800 |
|
|
 | Rich Koup wrote: > Is it possible to do huffman coding over and over again? Maybe try to flip > the even or odd bits between each recoding? Maybe try other transformations > between each coding to shuffle the data so it seems as if it's a new piece > of data so it can be recoded again. > > When we want to decompress the data, we would do a huffman decode then > shuffle it back to the way it was, then decode again...etc... > > Is it possible?
Yes, but it would be too slow for practical use. That's why Huffman codes aren't used for high order models like PPM. An arithmetic code would be faster, besides giving better compression.
-- Matt Mahoney
|
|
 | | From: | Rich Koup | | Subject: | Re: huffman coding over and over again | | Date: | Sat, 22 Jan 2005 19:24:55 -0500 |
|
|
 | What are the compression techniques that are similar to what I described in the orginal post called (if they have a name)? So the problem with it is the time it takes to compress and decompress, right? So practically it is possible, but would take a lot of time. Also, correct me if I'm wrong, did you say that arithmetic codes compress better than the technique I described?
Thank you
"Matt Mahoney" wrote in message news:1106439240.979808.291000@c13g2000cwb.googlegroups.com... > > Rich Koup wrote: > > Is it possible to do huffman coding over and over again? Maybe try to > flip > > the even or odd bits between each recoding? Maybe try other > transformations > > between each coding to shuffle the data so it seems as if it's a new > piece > > of data so it can be recoded again. > > > > When we want to decompress the data, we would do a huffman decode > then > > shuffle it back to the way it was, then decode again...etc... > > > > Is it possible? > > Yes, but it would be too slow for practical use. That's why Huffman > codes aren't used for high order models like PPM. An arithmetic code > would be faster, besides giving better compression. > > -- Matt Mahoney >
|
|
 | | From: | Popai | | Subject: | Re: huffman coding over and over again | | Date: | 23 Jan 2005 23:24:20 -0800 |
|
|
 | First of all please excuse my poor English, I'm from Romania :).
Suppose your idea works(but it doesn't). You need a method to specify at each "recompression" step which is the transformation you want to use (the same transformation at each step will not work for shure), and how many "recompressions" you have done. This will add extradata or overhead to your result (which will not be very small as you might imagine) and will almost anulate the result of repeated compression. Try to read something about Kolmogorov complexity and algorithmic information theory and probably you will find more intuitive explenations why your method doesn't works.
There is an interesting paper which explains that very well ( I mean perfect English too :)) ). It is actually a section from "Data Compression: The Complete Reference" by Davis Solomon, section removed from the 3dr edition of that excelent book. You can find the paper at:http://www.davidsalomon.name/DC3advertis/counting.arg.pdf Best regards, Popai
|
|
 | | From: | Willem | | Subject: | Re: huffman coding over and over again | | Date: | Sun, 23 Jan 2005 00:17:26 +0000 (UTC) |
|
|
 | Rich wrote: ) Is it possible to do huffman coding over and over again? Maybe try to flip ) the even or odd bits between each recoding? Maybe try other transformations ) between each coding to shuffle the data so it seems as if it's a new piece ) of data so it can be recoded again.
Huffman by itself doesn't compress data. It's just an encoding.
SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
|
|
 | | From: | Rich Koup | | Subject: | Re: huffman coding over and over again | | Date: | Sat, 22 Jan 2005 19:25:57 -0500 |
|
|
 | Right, but a side-effect from the coding, is that it does actually compress data when after it encodes the data in most of the cases. Right?
Thanks
"Willem" wrote in message news:slrncv5r8m.2jsr.willem@toad.stack.nl... > Rich wrote: > ) Is it possible to do huffman coding over and over again? Maybe try to flip > ) the even or odd bits between each recoding? Maybe try other transformations > ) between each coding to shuffle the data so it seems as if it's a new piece > ) of data so it can be recoded again. > > Huffman by itself doesn't compress data. It's just an encoding. > > > SaSW, Willem > -- > Disclaimer: I am in no way responsible for any of the statements > made in the above text. For all I know I might be > drugged or something.. > No I'm not paranoid. You all think I'm paranoid, don't you ! > #EOT
|
|
 | | From: | Willem | | Subject: | Re: huffman coding over and over again | | Date: | Sun, 23 Jan 2005 01:05:06 +0000 (UTC) |
|
|
 | Rich wrote: ) "Willem" wrote in message ) news:slrncv5r8m.2jsr.willem@toad.stack.nl... )> Huffman by itself doesn't compress data. It's just an encoding. ) ) Right, but a side-effect from the coding, is that it does actually compress ) data when after it encodes the data in most of the cases. Right?
Huffman doesn't encode 'data'. It encodes symbols, that usually have different probabilities. Stream of symbols with assigned probabilities in, stream of bits out.
How are you going to make symbols-with-assigned-probabilities from that stream of bits again ?
SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
|
|
 | | From: | Phil Carmody | | Subject: | Re: huffman coding over and over again | | Date: | 23 Jan 2005 14:11:43 +0200 |
|
|
 | "Rich Koup" writes:
> Right, but a side-effect from the coding, is that it does actually compress > data when after it encodes the data in most of the cases. Right?
Assuming you've communicated in advance (maybe by standard, maybe as a header/prefix to the message) the particular encoding used, the _majority_ of cases will cause expansion, not positive compression.
Compression functions are really expansion functions with a small but useful set of failure cases. The Kraft inequality explains why.
Phil -- Excerpt from Geoff Bulter's Proscriptive Dictionary: aaa Don't use this, there's no such word aaaa Don't use this, there's no such word aaaaa Don't use this, there's no such word
|
|
 | | From: | David A. Scott | | Subject: | Re: huffman coding over and over again | | Date: | 23 Jan 2005 14:35:32 GMT |
|
|
 | Phil Carmody wrote in news:87u0p8ibuo.fsf@nonospaz.fatphil.org:
> > Compression functions are really expansion functions with a small > but useful set of failure cases. > >
That to good to be something you thought of were did you get this gem I like it.
David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"
|
|
 | | From: | Phil Carmody | | Subject: | Re: huffman coding over and over again | | Date: | 23 Jan 2005 17:03:43 +0200 |
|
|
 | "David A. Scott" writes: > Phil Carmody wrote in > news:87u0p8ibuo.fsf@nonospaz.fatphil.org: > > > Compression functions are really expansion functions with a small > > but useful set of failure cases. > > That to good to be something you thought of were did you get this > gem I like it.
I first came up with it (or something very similar) a few years ago (here on comp.compression). I don't know if it has been said prior to that by someone else. I am quite proud of it, I must say :-).
Phil -- Excerpt from Geoff Bulter's Proscriptive Dictionary: aaa Don't use this, there's no such word aaaa Don't use this, there's no such word aaaaa Don't use this, there's no such word
|
|
 | | From: | Rich Koup | | Subject: | Re: huffman coding over and over again | | Date: | Sun, 23 Jan 2005 11:43:02 -0500 |
|
|
 | hehehe... So the times it actually compresses are the cases where it actually fails...
Ok, I understand now. Maybe the technique I described can be applied to the Zip compression algorithm? Zip then do a transformation, zip again, transform... etc...
When I say transformation, I mean flipping every nth bit.. or some other fancy method...
What would happen then, would it work?
Thanks
"Phil Carmody" wrote in message news:87zmz0gpbk.fsf@nonospaz.fatphil.org... > "David A. Scott" writes: > > Phil Carmody wrote in > > news:87u0p8ibuo.fsf@nonospaz.fatphil.org: > > > > > Compression functions are really expansion functions with a small > > > but useful set of failure cases. > > > > That to good to be something you thought of were did you get this > > gem I like it. > > I first came up with it (or something very similar) a few years > ago (here on comp.compression). I don't know if it has been said > prior to that by someone else. I am quite proud of it, I must > say :-). > > Phil > -- > Excerpt from Geoff Bulter's Proscriptive Dictionary: > aaa Don't use this, there's no such word > aaaa Don't use this, there's no such word > aaaaa Don't use this, there's no such word
|
|
 | | From: | David A. Scott | | Subject: | Re: huffman coding over and over again | | Date: | 23 Jan 2005 16:47:29 GMT |
|
|
 | "Rich Koup" wrote in news:hNmdnceyM4eKSW7cRVn-vw@rogers.com:
> hehehe... So the times it actually compresses are the cases where it > actually fails... > > Ok, I understand now. Maybe the technique I described can be applied > to the Zip compression algorithm? Zip then do a transformation, zip > again, transform... etc... > > When I say transformation, I mean flipping every nth bit.. or some > other fancy method... > > What would happen then, would it work? > > > Thanks > >
Depends on defination of work. The fact is if the first compression was good then you have mostly a random looking file. Since most random files don't compress it will most likely get longer every pass. Did you already forget that a compressor is nothing but an expander with a small limited set of failures that actually compress.
David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"
|
|
 | | From: | Willem | | Subject: | Re: huffman coding over and over again | | Date: | Sun, 23 Jan 2005 16:52:18 +0000 (UTC) |
|
|
 | Rich wrote: ) hehehe... So the times it actually compresses are the cases where it ) actually fails... ) ) Ok, I understand now. Maybe the technique I described can be applied to the ) Zip compression algorithm? Zip then do a transformation, zip again, ) transform... etc... ) ) When I say transformation, I mean flipping every nth bit.. or some other ) fancy method... ) ) What would happen then, would it work?
No. Any compression program works because the data it targets has a lot of redundancy. If it's a good compressor, it removes most of the redundancy the first time around. Transformations like you mention cannot make the data more redundant, only in rare cases it could expose some hidden redundancies. But the thing with redundancy is that it usually isn't that hidden.
Anyway, if your scheme would work on some compressor, then all that means is that that compressor wasn't very good to begin with.
SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
|
|
 | | From: | Phil Carmody | | Subject: | Re: huffman coding over and over again | | Date: | 23 Jan 2005 19:44:13 +0200 |
|
|
 | "Rich Koup" writes: > hehehe... So the times it actually compresses are the cases where it > actually fails...
If viewed with a twisted sense of humour and a thorough knowledge of the Kraft inequality, yes. My comment is not necessarily a useful one for teaching newbies though. (But it might be, who knows?) > Ok, I understand now. Maybe the technique I described can be applied to the > Zip compression algorithm? Zip then do a transformation, zip again, > transform... etc... > > When I say transformation, I mean flipping every nth bit.. or some other > fancy method... > > What would happen then, would it work?
If there's no _actual_ _concrete_ reason for the transformation to make the data more compressible, then it won't do anything useful.
The most famous transformations have concrete reasons why they work. BWT changes how predictable patterns occur, which makes them easier to find by common modellers. The predictable patters _were_ there in the original, it's just that they've been made easier to exploit (for real reasons based on knowledge of the types of files that people most commonly work with).
Just fiddling with bits on the hope that it might become compressible is not such a transformation. It's "compression by luck", which ranks alongside "compression by coincidence" as a futile technique. If you don't know why you shouldn't be trying it, then you don't know enough to be able to get it to work, so you shouldn't bother trying it. If you know enough to know why you shouldn't be trying it, then you know to not even bother trying it.
Phil -- Excerpt from Geoff Bulter's Proscriptive Dictionary: aaa Don't use this, there's no such word aaaa Don't use this, there's no such word aaaaa Don't use this, there's no such word
|
|
 | | From: | Rich Koup | | Subject: | Re: huffman coding over and over again | | Date: | Sun, 23 Jan 2005 15:17:50 -0500 |
|
|
 | Alright. It's based on luck. The luck that the actual transformation would transform the compressed data to something that has redundancies and repitions. So, here is another idea, 1 (256) or 2 (65536) bytes that could describe either 256 or 65536 different tranforms. The program could try which of these transforms can create the most redudancy and then act on that transfrom and append the 1 or 2 bytes that describe which transformation that was used into the transformed data. This offcourse is still based on luck that any of the different kinds of transforms can create redundancy. I also understand that this is a stupid idea because it would take tons and tons of time to try all the different kinds of transforms..etc.. not very practical.
I guess I should abandon this idea. :)
I learned a lot from your replies. Thank you all.
"Phil Carmody" wrote in message news:87mzv0ghw2.fsf@nonospaz.fatphil.org... > "Rich Koup" writes: > > hehehe... So the times it actually compresses are the cases where it > > actually fails... > > If viewed with a twisted sense of humour and a thorough knowledge > of the Kraft inequality, yes. My comment is not necessarily a useful > one for teaching newbies though. (But it might be, who knows?) > > > Ok, I understand now. Maybe the technique I described can be applied to the > > Zip compression algorithm? Zip then do a transformation, zip again, > > transform... etc... > > > > When I say transformation, I mean flipping every nth bit.. or some other > > fancy method... > > > > What would happen then, would it work? > > If there's no _actual_ _concrete_ reason for the transformation > to make the data more compressible, then it won't do anything useful. > > The most famous transformations have concrete reasons why they work. > BWT changes how predictable patterns occur, which makes them easier > to find by common modellers. The predictable patters _were_ there in > the original, it's just that they've been made easier to exploit > (for real reasons based on knowledge of the types of files that people > most commonly work with). > > Just fiddling with bits on the hope that it might become compressible > is not such a transformation. It's "compression by luck", which ranks > alongside "compression by coincidence" as a futile technique. If you > don't know why you shouldn't be trying it, then you don't know enough > to be able to get it to work, so you shouldn't bother trying it. If > you know enough to know why you shouldn't be trying it, then you know > to not even bother trying it. > > > Phil > -- > Excerpt from Geoff Bulter's Proscriptive Dictionary: > aaa Don't use this, there's no such word > aaaa Don't use this, there's no such word > aaaaa Don't use this, there's no such word
|
|
 | | From: | Willem | | Subject: | Re: huffman coding over and over again | | Date: | Sun, 23 Jan 2005 23:58:02 +0000 (UTC) |
|
|
 | Rich wrote: ) Alright. It's based on luck. The luck that the actual transformation would ) transform the compressed data to something that has redundancies and ) repitions. So, here is another idea, 1 (256) or 2 (65536) bytes that could ) describe either 256 or 65536 different tranforms. The program could try ) which of these transforms can create the most redudancy and then act on that ) transfrom and append the 1 or 2 bytes that describe which transformation ) that was used into the transformed data. This offcourse is still based on ) luck that any of the different kinds of transforms can create redundancy. I ) also understand that this is a stupid idea because it would take tons and ) tons of time to try all the different kinds of transforms..etc.. not very ) practical.
If you try 65536 different transforms, then the best of those will, on average, give you a gain of at most 2 bytes.
This is a corollary of the Counting Theorem.
I suggest you find out what that theorem is, and try to understand it. That is the very first step you should take when trying to think of a compression technique.
SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
|
|
 | | From: | cr88192 | | Subject: | Re: huffman coding over and over again | | Date: | Sun, 23 Jan 2005 00:27:31 GMT |
|
|
 | "Rich Koup" wrote in message news:19SdnSTTh7-Uf2_cRVn-tQ@rogers.com... > Is it possible to do huffman coding over and over again? Maybe try to flip > the even or odd bits between each recoding? Maybe try other > transformations > between each coding to shuffle the data so it seems as if it's a new piece > of data so it can be recoded again. > > When we want to decompress the data, we would do a huffman decode then > shuffle it back to the way it was, then decode again...etc... > > Is it possible? > don't think so.
for the input to huffman a difference in the probabilities for different values is assumed, and this is used to encode the data. for the output, in most cases all the values should have near-equal probabilities (or at least not different enough to make much use of).
bit shuffling or transforms are unlikely to fix this, as the results are still likely to be chaotic and fairly evenly distributed.
others may have other thoughts though...
|
|
|