Tuesday 28th February, 2017

Locale based sorting

Let's stereotype for a moment: developers whose first (and in many cases, such as my own, only) language is English, don't always appreciate localisation issues to the degree we should. With people using DataTables all over the world (68 community translations for DataTables core) this is something that I'm very aware of as a library developer.

While DataTables offers options for localisation of the text it uses, we also need to consider the data that the table contains. Sorting is one area where this has proven to be particularly difficult to get right, but fortunately with new Javascript APIs that are now available in browsers, we can get locale based sorting absolutely right every time.

Also, don't make the mistake of thinking that this post is only for developers who are working in languages other than English - the sorting options made available by the Intl API offer something for everyone!

Standardisation

String sorting in Javascript is often done with simple comparison of Unicode code points, under the assumption that characters with a higher code point should be sorted after those with lower code points. That of course is nonsense - it might work for many cases (particularly ASCII) but isn't globally suited.

Javascript 1.2 introduced localCompare but this suffers from performance issues and significant differences between browsers and platforms.

Which leads us to the ECMA-402 standard - an internationalisation API for Javascript. This API largely solves all of the issues with sorting localised strings in Javascript. It even goes beyond that and offers some really useful features such as control of case sensitivity, ignoring punctuation and consideration for numeric values in a string.

Sorting with the Intl API

Sorting with the Intl API is done by creating a collator for the locale in question. This is essentially an optimisation step to ensure that we only perform the setup for the language and options once. The resulting collator object has a compare method that we can use to sort the data, just like we would with Array.prototype.sort. For example:

var collator = new Intl.Collator( 'fr' );

arrayToSort.sort( collator.compare );

And that's it! The MDN documentation details the options can be provided to the collator constructor, so all we need to do now is wrap it into a form that can be used by DataTables.

Interface with DataTables

DataTables' string sorting is implemented as a simple code point comparison, which as noted above, isn't always going to be suitable. Thus, what we want to do is replace the default string sorting with that offered by the Intl API. We can do that by simply overwriting the default asc and desc methods for the string sort in DataTables (which is exposed via its standard sorting plug-in API).

        var collator = new window.Intl.Collator( ... );
        var types = $.fn.dataTable.ext.type;

        delete types.order['string-pre'];
        types.order['string-asc'] = collator.compare;
        types.order['string-desc'] = function ( a, b ) {
            return collator.compare( a, b ) * -1;
        };

There are two important points to make here:

  1. We delete the string-pre formatter function that DataTables uses. The default sort in DataTables is case insensitive, which we might, or might not, want with the Intl sort.
  2. The descending function is a little more interesting than the ascending one - we need to invert the default from the collator compare method - which is done simply by multiplying the result by -1.

Now we wrap that into a function that can be called like many of the other DataTables plug-ins:

$.fn.dataTable.ext.order.intl = function ( locales, options ) {
    if ( window.Intl ) {
        var collator = new window.Intl.Collator( locales, options );
        var types = $.fn.dataTable.ext.type;

        delete types.order['string-pre'];
        types.order['string-asc'] = collator.compare;
        types.order['string-desc'] = function ( a, b ) {
            return collator.compare( a, b ) * -1;
        };
    }
};

For older browsers that don't support the Intl API, the above code will do nothing; the user is left with the old default code point based sorting. All current browsers support this API, so it will only be users with legacy browsers who wouldn't see the benefits.

Usage

To now actually use the Intl sorting in DataTables we need to execute the function defined above:

$.fn.dataTable.ext.order.intl();

$('#myTable').DataTable();

The above will use the browser's default locale. If you want to explicitly define a locale (which is more likely since you control the data shown in the table, not the user's browser!) you can do so using the first parameter (which is exactly the same as for Intl.Collator):

$.fn.dataTable.ext.order.intl( 'fr' ); // French locale

The second optional parameter are the options and make the Intl API really interesting, not just for localisation, but better and more controlled sorting all together. Notable options are:

  • sensitivity - What differences between strings should cause them to sorted separately (for example accents). The values for this can be base, accent, case or variant.
  • ignorePuctuation - Flag to indicate if punctuation should be ignored.
  • numeric - Allow numeric data to be sorted as such rather than as a string.
  • caseFirst - Option to allow upper or lower case to be sorted before the other. This can be upper, lower or false.

The one I think that is perhaps of most interest is the numeric option as this comes up a lot in the DataTables forums. If you have mixed numeric and string data, this option allows the data to be sorted in the natural order rather than on a strictly string basis (e.g. 1, 2, 11 rather than 1, 11, 2).

CDN

This plug-in can be used simply by copying the code from the above, or you can include it directly from the DataTales CDN.

JS

Future development

The Intl doesn't just offer options for sorting, but also for filtering - how to handle accents when filtering is another interesting and difficult topic for example. While this post doesn't explore options for that, it is something that I will be looking into in future.

I also plan to have support for the Intl API baked into the next major version of DataTables rather than requiring a plug-in.

Enjoy!