Last active
December 15, 2015 12:08
-
-
Save purefn/5257660 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// This character generator is similar to the default one from ScalaCheck, but it filters out: | |
// 1. Unprintable control characters. These cause problems in Mongo queries, and should not able | |
// to get into our database since they will not be present in inputs parsed from HTTP text. | |
// 2. Characters that do not exist in the default platform encoding. We can exclude a big range | |
// of these right away (which ScalaCheck does) because they're in the range used for two- | |
// character UTF-16 sequences, but others are only detected by checking Character.isDefined(_). | |
// The symptom of using a bad character in a string is that when you encode it to UTF-8 and | |
// back again, you get a different string, so any of our web and database tests that check the | |
// input against the output may fail-- and you'll see a question mark somewhere in the data, | |
// since that's how bad characters are displayed. | |
// This involves building a humungous list of characters, but it only gets built once, and since | |
// it's an array the random strings will be a bit faster than ScalaCheck's usual. | |
lazy val validUnicodeChars: Array[Char] = (9.toChar :: 10.toChar :: 13.toChar :: | |
((0x20.toChar to 0xD7FF.toChar) ++ (0xE000.toChar to Char.MaxValue)).filter(Character.isDefined).toList | |
).toArray | |
def genUnicodeChar: Gen[Char] = Gen.oneOf(validUnicodeChars) | |
def genUnicodeString: Gen[String] = Gen.listOf(genUnicodeChar).map(_.mkString) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment