Another slightly OT q...

Wed May 9 16:30:34 UTC 2007

On 09/05/07, Ted Mittelstaedt <tedm at toybox.placo.com> wrote:
>
>
> > -----Original Message-----
> > From: owner-freebsd-questions at freebsd.org
> > [mailto:owner-freebsd-questions at freebsd.org]On Behalf Of Gary Kline
> > Sent: Tuesday, May 08, 2007 7:19 PM
> > To: usleepless at gmail.com
> > Cc: Gary Kline; FreeBSD Mailing List
> > Subject: Re: Another slightly OT q...
> >
> >
> >
> >       So it *was* a hoax?  Rats.  Some weeks ago on Public
> >       Broadcasting, a few sentences were spoken on the potential of
> >       fractal geometry to achieve [I'm guessing] data-compression on
> >       the order of what Sloot was claiming.  So far, no one has figured
> >       it out.  It may be a dream... .
> >
>
> There's some cool math out there that explains all of this but I never liked
> math, but it isn't necessary to know the math to understand the issue.  Just
> consider the problem for a while and you will realize that the compression
> ratio of a specific data stream varies dependent on the amount of repetition
> in
> the input datastream.  A perfectly unrandom datastream, like a constant
> series of logical 1's, carries no information, but has a compression ratio
> that is infinite.  A perfectly random datastream, on the other hand,
> also carries no information, but has a compression ratio that is zero.
> I believe that a datastream that is 50% of the way between either extreme
> carries the most information, and I believe your typical datastream is much
> closer to
> the perfectly unrandom side than the perfectly random side, compression is
> merely the process of pushing the randomness of the stream closer to the
> random side.

Actually, the more information (as such) the closer
the data stream is to perfectly random.  The relation-
ship might be asymptotic, but I am no maths major.

> Thus, if the input datastream is very close to the perfectly unrandom side -
> meaning it has a very high amount of repetition in it, you can get some
> pretty spectacular compression ratios.  But as you move closer to unrandom,
> you carry less data.  So, the better applications emit datastreams that
> are less unrandom, therefore compression does not work as well on them.

I suppose this leads to the discussion about what
"data" and "information" really are.  Imagine a can.
The can is data.  Imagine tha can is full of worms.

> This of course is completely ignoring the other data issue, is the
> application
> data efficient to begin with?  For example, you can transfer about a page of
> information in ASCII that consumes about 1K of data, that same page of
> information in a MS Word file consumes a hundred times that amount of
> space -
> Word is therefore extremely inefficient with data.

In this case, since word "has to" replace typesetting,
layout, and formatting software, in addition to being a
word processor the header and meta information tend
to bloat the files quite a lot.

Every few years someone comes along who makes
some mad claims about some new buzzword-enhanced
compression technology.  Obviously, if there is ever a
radical leap forward in that area the theory will have to
follow, since modern theory cannot accomodate (lossless)
compression past the point of randomness (generally less
than 16:1 even for Danielle Steele).  mp3, avi, real media
mpeg, et al are a different story entirely, sicne they are
lossy and optimised for their respective information.

-rw-r--r--  1 1705  1705  7826420 May  9 10:58
ssion_i_really_fuckin_care_about_you.rm
-rw-r--r--  1 1705  1705  7791691 May  9 10:58
ssion_i_really_fuckin_care_about_you.rm.bz2

In this case, very slightly compressible: with some data
your resulting file will be slightly larger, yet the raw datastream
(and it looks like it was filmed from a cameraphone here (though
most likely an 8mm digicam (these, I believe, compress on the fly,
so the raw datastream never touches tape))) would probably have
been many tens, if not several hundreds, of megabytes.

Remember life before the tweel?

-- 
--