PropCheck: utf8 generator doesn't seem to respect resize/2

aaronrenner · February 1, 2019, 6:04am

I’m going through the examples in Property-Based Testing with PropEr, Erlang, and Elixir in the Custom Generators chapter and I’m unable to make the make the generated bio: utf-8 string grow with the resize function.

property "profile 2", [:verbose] do
  forall profile <- [
    name: utf8(),
    age: pos_integer(),
    bio: sized(s, resize(s * 35, utf8()))
  ] do
    name_len = to_range(10, String.length(profile[:name]))
    bio_len = to_range(300, String.length(profile[:bio]))
    aggregate(true, name: name_len, bio: bio_len)
  end
end

According to the book the stats should look something like this with the generated bio data being up various lengths up to 1500 characters.

test output
 	
32% {name,{0,10}}
28% {bio,{0,300}}
12% {bio,{300,600}}
10% {name,{10,20}}
7% {name,{20,30}}
4% {bio,{600,900}}
3% {bio,{900,1200}}
1% {bio,{1200,1500}}
1% {name,{30,40}}

However when I actually run the example, all of my bio data ended up in the {0,300} range.


50% {bio,{0,300}}
30% {name,{0,10}}
7% {name,{10,20}}
4% {name,{20,30}}
3% {name,{30,40}}
1% {name,{40,50}}
1% {name,{70,80}}
0% {name,{50,60}}
0% {name,{80,90}}
0% {name,{90,100}}
0% {name,{130,140}}
0% {name,{1130,1140}}

Exploring the issue

After some experimentation, I simplified things and decided to build my own property tests.

property "utf8 with no resizing", [:verbose] do
  forall my_string <- utf8() do
    collect(true, to_range(100, String.length(my_string)))
  end
end

# -- Stats: utf8 with no resizing --
# 95% {0,100}
# 5% {100,200}

property "utf8 large-resizing", [:verbose] do
  forall my_string <- resize(10_000, utf8()) do
    collect(true, to_range(100, String.length(my_string)))
  end
end

# -- Stats: utf8 large-resizing --
# 99% {0,100}
# 1% {100,200}

property "utf8 set utf8-size", [:verbose] do
  forall my_string <- utf8(10_000) do
    collect(true, to_range(100, String.length(my_string)))
  end
end

# -- Stats:  set utf8-size --
# 6% {5800,5900}
# 4% {5900,6000}
# 3% {600,700}
# 3% {1000,1100}
# 3% {1500,1600}
# 3% {2100,2200}
# 3% {4400,4500}
# ...

property "utf8 sized/resize macro", [:verbose] do
  forall my_string <- sized(s, resize(100 * s, utf8())) do
    collect(true, to_range(100, String.length(my_string)))
  end
end

# -- Stats:  utf8 sized/resize macro --
# 98% {0,100}
# 2% {100,200}

property "utf8 sized macro with setting utf8-size", [:verbose] do
  forall my_string <- sized(s, utf8(s * 100)) do
    collect(true, to_range(100, String.length(my_string)))
  end
end

# -- Stats:  utf8 sized macro with setting utf8-size --
# 11% {400,500}
# 10% {0,100}
# 9% {100,200}
# 9% {200,300}
# 6% {500,600}
# ...

It seems like the utf8 generator doesn’t compute it’s length based on the generation size. Instead I have to pass in the desired length as an argument.

# Doesn't work
forall my_string <- resize(10_000, utf8()) do
 #...
end

# Works
forall my_string <- utf8(10_000)) do
 #...
end

The Question

Is it normal for the utf8 generator to not base it length on the generation size? I’m trying to figure out:

Am I doing something wrong?
Is this a bug that needs to be reported to proper/propcheck?
Is this an issue in the book that needs to be reported as errata?

If someone would like to play around with the code from the book themselves, they can download it here, expand the .tgz and view the file in code/CustomGenerators/elixir/pbt/test/generators_test.exs.

Thanks in advance for everyone’s feedback!

bibekp · January 22, 2020, 3:50am

I’m just wondering, did this ever get resolved? If so, what was the fix here? Thanks B

bibekp · January 22, 2020, 8:34pm

@aaronrenner @alfert

alfert · January 23, 2020, 5:58am

I see this question here for the first time. It would be helpful if you try this out and open a bug report on github (https://github.com/alfert/propcheck/issues), if utf8 does not behave as expected. It might be helpful to check against the master, since we are implementing a bunch of features.

But from the docs, you can see this:

@spec utf8[ext_non_neg_intege, 1..4) :: type()
utf8(n, max_codepoint_size)

Bounded upper size utf8 binary, codepoint length =< MaxCodePointSize .

Limiting codepoint size can be useful when applications do not accept full unicode range. For example, MySQL in utf8 encoding accepts only 3-byte unicode codepoints in VARCHAR fields.

If unbounded length is needed, use :inf as first argument.