Posts tagged ‘ecma’

JavaScript Challenge Revisited: Lotto Number Generator in Chains

Matthias Reuter from United Coders proposed a JavaScript Challenge: A Lotto Number Generator which the rules follow:

Write a JavaScript function that generates random lotto numbers. This function has to return an array of six different numbers from 1 to 49 (including both) in ascending order. You may use features of ECMA-262 only, that means no Array.contains and stuff. You must not induce global variables.

The function has to look like this

var getRandomLottoNumbers = function () {
    // your implementation here
};

Minify your function using JSMin (level aggressive) and count the bytes between the outer curly braces.

It might look simple but it turns out to be an interesting challenge considering there’s a bunch of ways to solve it where the length of the minified final implementation is the main concern: the smaller the better. He describes his solution and invites others to post their solutions as comments. His final solution for such challenge has 86 bytes:

var n=[],i=0;for(;++i<50;)n.push(i);for(;--i>6;)n.splice(i*Math.random()|0,1);return n

Some readers wrote even smaller solutions such as:

return [,,,,,,].map(function()Math.random()*50|1)

However this one is by far the smallest, 49 bytes only, it is invalid because it doesn’t fulfill the rules for the following reasons:

  1. map is not part of ECMA-262 specification
  2. [,,,,,,] is inconsistent across browsers, IE would create an array of 7 undefined values
  3. it’s not in ascending order
  4. may contain duplicated values

So far the smallest solution that fulfills all the rules has 65 bytes:

for(var v=[],m=6,n=49;m;--n)Math.random()*n>m?0:v[--m]=n;return v

Revisiting

I’ve also contributed with my solution but with another fashion way: Using a single line chaining solution, i.e. no semi-colons separating statements, like: return a().b.c().d.e()

My solution has 198 bytes:

return [0,0,0,0,0,0].join().replace(/0/g,function(){Array.a=Array.a||{};var x;while((x=Math.random()*50|1)in Array.a)Array.a[x]=1;return x}).split(',').sort(function(a,b){delete Array.a;return a-b})

It’s not the smallest compared to the other solutions but it does fulfill the rules and is in a single line chaining. It might be a little obscure for some but I will break it down into smaller peaces:

return
    [0,0,0,0,0,0]   // creates an array with 6 positions filled with 0's
    .join()         // converts the array into string: "0,0,0,0,0,0"
    .replace(/0/g,  // replaces all 0's in the string using the following function
        function () {
            Array.a = Array.a || {};        // augments Array with an object property only once
            var x;                          // variable to store a random number
            while (                         // while condition: assures no dupes
                (x =                        // assigning a value to x:
                    Math.random() * 50 | 1  // generates a random number times 50 or 1 (when 0)
                ) in Array.a                // checks if x isn't in the augmented Array object
            )                               // no block for while statements
            Array.a[x] = 1;                 // adds number as property into Array object
            return x                        // replaces the "0" found by the random x number
        }
    )               // end of replace function
    .split(',')     // converts string into array of strings using comma as separator
    .sort(          // sorts the array using the following compare function
        function (a, b) {   // elements to be compared
            delete Array.a; // restores Array, removing previously augmented property
            return a - b    // < 0 then a < b; = 0 then a = b; > 0 a > b
        }
    )               // end of sort, a sorted array is returned

I also invite you to post your single line chaining solution as a comment. Happy chaining. ;-)

String searching algorithms in JavaScript engines

I’ve just finished chapter 7: Writing Efficient JavaScript by Nicholas Zakas on Steve Souders‘ new book, Even Faster Web Sites, where he presents several string optimization techniques to improve JavaScript performance and wondered which algorithm does String.indexOf method implements on JavaScript engines (aka ECMAScript engines).
A few months ago I’ve asked this question to Yahoo! fellow Douglas Crockford and he said the ECMAScript standard does not require a specific algorithm, so it could vary with each browser. You can check that on section 15.5.4.7 of Standard ECMA-262. I decided then to download the most popular open-source JavaScript engines source codes and found mainly 3 algorithms:

  • Naïve: simple and least inefficient way to search in strings, it basically checks every position in the haystack then every position of the needle. The advantage of using this algorithm is that it needs no initial overhead such as auxiliary tables creation. It has O(mn) complexity, where m is the needle length and n the haystack length.
  • Boyer-Moore: Efficient searching string algorithm that preprocesses the needle and doesn’t check every position in the haystack but rather skips some of them on each unsuccessful attempt. It has O(n) complexity with O(3n) in the worst case.
  • Boyer-Moore-Horspool: It’s a simplification of Boyer-Moore’s algorithm with less overhear during initial needle preprocessing. It also has O(n) complexity but O(mn) in the worst case.

Algorithms by engines

The String.indexOf algorithms by JavaScript engines follows:

JavaScript Engine Layout Engine Browsers String.indexOf algorithm
SpiderMonkey Gecko Firefox up to 3.0.* Boyer-Moore-Horspool
TraceMonkey Gecko Firefox from 3.1.* Boyer-Moore-Horspool
KJS KHTML Konqueror Naïve
JavaScriptCore WebKit Safari up to 3.* Naïve
SquirrelFish WebKit Safari from 4.* Naïve
JScript Trident Internet Explorer ?
V8 WebKit Chrome Strategy: Naïve, Boyer-Moore-Horspool and Boyer-Moore
Linear B Presto Opera 7.0 - 9.50[ ?
Futhark Presto Core 2 Opera from 9.50 ?
Rhino - - java.lang.String.indexOf

SpiderMonkey

Implements the String.indexOf in C with some verifications in string lengths prior to run BMH algorithm in order to avoid long searching for relatively small strings.

Source code available at: ftp://ftp.mozilla.org/pub/mozilla.org/firefox/releases/3.0.13/source/firefox-3.0.13-source.tar.bz2

TraceMonkey

It has exactly the same String.indexOf implementation as SpiderMonkey but in C++.

Source code available at: ftp://ftp.mozilla.org/pub/mozilla.org/firefox/releases/3.5.2/source/firefox-3.5.2-source.tar.bz2

KJS

The main part of the naïve implementation of indexOf follows*:

/* ... */
for (const UChar* c = data_ + pos; c <= end; c++)
    if (c->uc == fchar && !memcmp(c + 1, fdata, fsizeminusone))
        return (c - data_);
 
return -1;

* taken from KDE 4.0 API reference

Files related to String.indexOf method:

  • string_object.cpp: defines the prototype for String object where indexOf is in a switch case statement and calls find function.
  • ustring.cpp: defines the find function where the naïve algorithm is implemented.

Browse the source code online: http://api.kde.org/4.0-api/kdelibs-apidocs/kjs/html/files.html

JavaScriptCore & SquirrelFish

These engines are known as JavaScriptCore in WebKit Project and was originally derived from KJS, hence still shares the same algorithm for String.indexOf.

Files related to String.indexOf method:

  • root/trunk/JavaScriptCore/runtime/StringPrototype.cpp: this is where indexOf method is defined and call find function
  • root/trunk/JavaScriptCore/runtime/UString.cpp: look for find function

Source code available at: http://webkit.org/building/checkout.html
Browse the source code online: http://trac.webkit.org/browser

V8

A very smart strategy is applied to the string searching in order to choose the best algorithm based on the length of the needle:

  • First of all it checks if there is a non-ASCII needle for an ASCII haystack and bail out if there is.
  • Checks if the needle length is less than 5 then uses a naïve solution called simpleIndexOf, because the max shift of Boyer-Moore on such needle length doesn’t compensate for the overhead. This simpleIndexOf function never bails out, it means that the needle will be checked for a match in the whole haystack.
  • If the needle length is greater than or equals to 5 another simpleIndexOf function is called. This one considers how much work have been done (unsuccessful matches) in order to stop trying and switch for a better algorithm. This is called the “badness count” which once reached the max, stop the search and returns the index in the haystack where the next algorithm should continue from.
  • The next algorithm in the strategy chain is Boyer-Moore-Horspool which also consider the badness count prior to jump to the next algorithm.
  • The last one is Boyer-Moore which has some initial overhead when creating good and bad shift tables.

Source code available at: http://code.google.com/p/v8/wiki/Source?tm=4

Rhino

Rhino runs on top of Java Virtual Machine and uses the java.lang.String.indexOf from Java language implemented for such JVM. Interestingly there is a comment saying:

“Uses Java String.indexOf(). OPT to add - BMH searching from jsstr.c”.

Where jsstr.c is the file for SpiderMonkey JavaScript String implementation. Implementing such algorithm in Java might degrade the search performance, unless the java.lang.String.indexOf implementation is much worse than that.

Source code available at: http://www.mozilla.org/rhino/download.html

Other engines

What about Internet Explorer, Opera and other browsers JavaScript engines? As they aren’t open-source projects I could not check their codes out. :-(

Benchmark

By running a simple test across some browsers we can have an idea how fast/slow is String.indexOf on some JavaScript engines although this doesn’t necessarily mean an algorithm is better than another because the performance of the engine itself might affect the outcome.
The test consists of the average of a 100 times running a search for the word “foobar” in the middle of a ~1200 length “ipsum lorem” text iterating 1 million times each search. Try it yourself.

The results in the follow table were taken by running this test on the same machine (Pentium 4HT, 3GHz, 1Gb RAM, Windows XP SP3).

JavaScript Engine Browser Version Average (ms)
V8 Chrome 2.0.172.39 827.66
SpiderMonkey Firefox 3.0.13 947.97
TraceMonkey Firefox 3.5.2 1169.25
SquirrelFish Safari 4.0.2 1207.02
KJS* Konqueror 4.2.2 1361.59
SpiderMonkey Firefox 2.0.0.20 1456.57
Futhark Opera 10.00 beta 2 1549.06
Futhark Opera 9.64 1613.02
JScript** Internet Explorer 8.0 3101.23
Rhino*** - - 4103.64
JScript Internet Explorer 6.0 4479.82
JScript Internet Explorer 7.0 4515.08

* running on the same machine with Ubuntu 9.04 live cd
** running on a VM on a different computer
*** running on Sun JRE 6 - 1.6.0_14

Again, these results don’t prove which algorithm is the best due to different browser performances, however it is worth noting that Firefox 3.0.13 performed better than Firefox 3.5.2 on this benchmark. Internet Explorer had the worst results, it can be either the algorithm or the browser performance itself or even both. :-)