Friday, September 17, 2010

List of countries and regions from FreeBase



When developing web application, it is typical to ask for user's mail address. Country and region is easy to misspell so it would be nice to have autocompletion for those two fields.

But where can we get the list of all countries in the world and their regions? And preferably in format which can be consumed by programming language, for example JSON: you don't want to write html parser to retrieve your data from Wikipedia pages.

Here is FreeBase for you. FreeBase is free hierarchical database. You can think of it as about wikipedia for computers.

FreeBase allows you to write your query once and not revisit your program each time when two countries are split, or time zone information is changed. As soon as this info is entered into FreeBase your program will use it without any changes. Another concern is that finding this information manually is quite a lot of work. Just think how much time would you spend collecting all countries/regions/cities. And think how quickly this information is outdated. Your application will become outdated in matter of months.

First thing we need to write a query to retrieve the list of all countries in the world.

http://www.freebase.com/api/service/mqlread?queries={"q1":{"query":[{"limit":1000,"name":null,"type":"/location/country"}]}}

This will produce the following result:

{
"code": "/api/status/ok",
"q1": {
"code": "/api/status/ok",
"result": [
{
"name": "United States of America",
"type": "/location/country"
},
{
"name": "Germany",
"type": "/location/country"
},
{
"name": "Australia",
"type": "/location/country"
},
{
"name": "Iran",
"type": "/location/country"
},
{
"name": "United Kingdom",
"type": "/location/country"
},
...
...
...
{
"name": "Rome, Italy 11.15.04",
"type": "/location/country"
},
{
"name": "Diff\u00e9rance",
"type": "/location/country"
},
{
"name": "Kingdom of Croatia-Slavonia",
"type": "/location/country"
},
{
"name": "Tyse",
"type": "/location/country"
},
{
"name": "Rain",
"type": "/location/country"
},
{
"name": "migrated from India in 1968",
"type": "/location/country"
},
{
"name": "Republic of Genoa",
"type": "/location/country"
},
{
"name": "Mygdonian",
"type": "/location/country"
},
{
"name": "Theban",
"type": "/location/country"
}
]
},
"status": "200 OK",
"transaction_id": "cache;cache01.p01.sjc1:8101;2010-09-19T00:03:10Z;0013"
}


Looks good. But wait, I've never heard of a country named "migrated from India in 1968".
Apparently, there is some noise in the database which can be explained by mistakes of volunteer contributors. They confuse association of a topic with its type and and instead of making a topic as having association with some country they may by mistake make a topic itself a country.
To filter out this noise we can filter only those countries which have FIPS code. The query code is:

[{
"type": "/location/country",
"limit": 600,
"name": null,
"id": null,
"fips10_4": {"value": null, "optional": false}
}]


I won't go into too much details of MQL, you can read more on FreeBase site but basic idea is: if we provide property with a value, search by values, if we provide empty property, return its value.
"Optional" is a special property which allows filtering unassigned properties. "optional": false demands property to exist.
Here is url:

http://api.freebase.com/api/service/mqlread?query={"query":[{"type":"/location/country","limit":600,"name":null,"id":null,"fips10_4":{"value":null,"optional":false}}]}

Much better, "migrated from India" is gone so are gone attic Greek states like Theban, because they do not have FIPS code.

Now we want to do something useful with this info. As we see, information in the DB is not super reliable, so the best approach is to give user ability to select their country but still leave an option to enter text even if it is not in the database. jQuery's autocomplete feature is what we want.


<!doctype html>
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.js" type="text/javascript"></script>
<script src="http://ajax.googleapis.com/ajax/libs/jqueryui/1.8.5/jquery-ui.min.js" type="text/javascript"></script>
<link rel="stylesheet" href="http://ajax.googleapis.com/ajax/libs/jqueryui/1.8.5/themes/smoothness/jquery-ui.css" type="text/css"/>
<script type="text/javascript">
$(function() {
var countriesUrl = 'http://api.freebase.com/api/service/mqlread?callback=?&query={"query":[{"type":"/location/country","limit":600,"name":null,"id":null,"fips10_4":{"value":null,"optional":false}}]}';
$.getJSON(countriesUrl, function(data, textStatus, xhr) {
var names = $.map(data.result, function(d) {return d.name})
$("#country").autocomplete({source: names})
})

})
</script>
</head>

<body>
<p>Country: <input id="country" type="text" /></p>
</body>
</html>


Here we go, autocomplete of country in 4 lines of code!

Watch out for cross site scripting: browser will not allow HttpXmlRequest because origin of your html page is different from api.freebase.com. To address this, we added "callback=?" to parameter of the query and result is formatted as JSONP.

There is a jQuery component "freebase suggest" but it implements autocomplete in its own way and we want to stick to the standard components as much as possible, that's why we used jQuery's native "autocomplete".

In the next blog I'll add querying region of chosen country.