looking for someone to fix humanize_number (test cases included)

Tue Dec 25 18:23:58 UTC 2012

On Tue, Dec 25, 2012 at 07:20:37AM -1000, Clifton Royston wrote:
> On Mon, Dec 24, 2012 at 12:00:01PM +0000, freebsd-hackers-request at freebsd.org wrote:
> > From: John-Mark Gurney <jmg at funkthat.com>
> > To: hackers at FreeBSD.org
> > Subject: looking for someone to fix humanize_number (test cases
> > 	included)
> > 
> > I'm looking for a person who is interested in fixing up humanize_number.
...
> > So I decided to write a test program to test the output, and now I'm even
> > more surprised by the output...  Neither 7.2-R nor 10-current give what
> > I expect are the correct results...
> > 
> > Feel free to take a look at the test program posted to:
> > http://people.freebsd.org/~jmg/humanize_numbers/
> > 
> > The .c contains what I think the output should be.
>  
>   I'm testing on 7.3R (yes, I know, I know, should be on 8 or 9) and
> see similar results as to rounding problems; see below on the others.
> 
> > So far the bugs I know of:
> > 1) rounding is incorrect (started this whole search)
...
> > 3) some cases zero is returned though it isn't zero, more like 0T for 512 G
> >    (indexes 16, 17, 22, 23)
> 
>   I think these last are caused by integer wraparound and truncation in the
> integer constant calculations of your test program, once you get beyond 1G. 
...
>   There's another brain-blip bug which took me a couple minutes of staring
> at - your test skips over "peta-" and expects "exa-" (E) to come after
> "tera-".  Fixing that by replacing "1 E" and "2 E" with "1 P" and "2 P"
> corrects a couple more errors.  I'm left with index 1-11 all showing one
> less than expected ("0 K" for "1 K", and so on to "1 T" for "2 T"), and 25
> and 27 showing the same problem - so at least it's down to just the rounding
> problem.
>   
>   There's actually another problem implicit in the results from the rounding
> problem - I think it should never yield "0 M" instead of "512 K"; for that
> matter, I would think anything up to "999 K" (divisor 1000) or "1023 K"
> (divisor 1024) should be represented with the smaller unit, not as "1 M".

  Having looked more closely at your test, I now see that it forces the
current behavior by setting the buffer length to 4, leaving room for
only 3 characters - so that part is reasonable.  

  I also realized that the flags and scale fields in the structure
initialization in the test code are swapped, which seemed to explain
some problems.  However, switching the order to the correct one, so
that the flags were actually used, revealed a lot more problems, for 
instance:

mismatch on index 1, got: "500", expected "1 K". (correct!)
mismatch on index 2, got: "500", expected "1 M".
mismatch on index 3, got: "500", expected "1 G".
...
mismatch on index 7, got: "150", expected "2 K".
mismatch on index 8, got: "150", expected "2 M".
...

 I now question whether it's working correctly with any flags other
than 0.  The man page states:

  "The len argument must be at least 4 plus the length of suffix, in
   order to ensure a useful result is generated into buffer." 

which this satisfies but in fact larger sizes don't seem to be adequate
either; for example with a 6 char buffer:

mismatch on index 1, got: "500", expected "1 K". (correct!)
mismatch on index 2, got: "50000", expected "1 M".
mismatch on index 3, got: "50000", expected "1 G".
...
mismatch on index 11, got: "15000", expected "2 P".
mismatch on index 13, got: "512 ", expected "1 K". (correct!)
mismatch on index 14, got: "52428", expected "1 M".
mismatch on index 15, got: "53687", expected "1 G".
...

  I am bemused.
  -- Clifton

-- 
   Clifton Royston  --  cliftonr at iandicomputing.com / cliftonr at volcano.org
       President  - I and I Computing * http://www.iandicomputing.com/
 Custom programming, network design, systems and network consulting services