Saturday, April 17, 2010

Oracle database geocoding with Google earth

Recentley someone asked me if I could help with a project where he was needed to gecode addresses and store this information in a Oracle database. For those who do not know what geocoding is, wikipedia has the following to say about it;

"Geocoding is the process of finding associated geographic coordinates (often expressed as latitude and longitude) from other geographic data, such as street addresses, or zip codes (postal codes). With geographic coordinates the features can be mapped and entered into Geographic Information Systems, or the coordinates can be embedded into media such as digital photographs via geotagging."


His original plan was to use a BASH script to download information from Google Earth and grep the needed information from the results from Google. The second step would be to load the data in the database and start using it. However I used a different technique some time ago to do something similar and we came to the conclusion all could be done from the inside the database itself. In Oracle 11G (and other releases) you can use the function HTTPURITYPE to retrieve information from a web address using the HTTP protocol. As google maps is using the HTTP protocol you can also use it to download the needed information. This is working the best if you retrieve the information in a XML format.

Google is providing you the option to download information from the google maps website in a XML format. The XML format used to described the information you are getting from Google is named KML.

"Keyhole Markup Language (KML) is an XML-based language schema for expressing geographic annotation and visualization on existing or future Internet-based, two-dimensional maps and three-dimensional Earth browsers. KML was developed for use with Google Earth, which was originally named Keyhole Earth Viewer. It was created by Keyhole, Inc, which was acquired by Google in 2004. The name "Keyhole" is an homage to the KH reconnaissance satellites, the original eye-in-the-sky military reconnaissance system first launched in 1976. KML is an international standard of the Open Geospatial Consortium. Google Earth was the first program able to view and graphically edit KML files, and other projects such as Marble have also started to develop KML support."

Lets say for example I want to have information about the location of my office I can check the google maps website by entering the following URL http://maps.google.com/maps?f=q&source=s_q&geocode=&q=Netherlands,Utrecht,papendorpseweg 100&sie=UTF8 now we do however still have the information in a HTML format while we want to have the coordinates in a XML format. By adding &output=kml to the url we are not served a website we are served a XML file http://maps.google.com/maps?f=q&source=s_q&geocode=&q=Netherlands,Utrecht,papendorpseweg 100&sie=UTF8&output=kml . If we look in the file we will find all the information about the coordinates we need. The file looks as below:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.0">
<Placemark>
<name>Papendorpseweg 100, 3528 Utrecht, The Netherlands</name>
<address>Papendorpseweg 100, 3528 Utrecht, The Netherlands</address>
<styleUrl>root://styleMaps#default+nicon=0x304+hicon=0x314</styleUrl>
<Point>
<coordinates>5.089984,52.064688,0</coordinates>
</Point>
<LookAt>
<longitude>5.089984</longitude>
<latitude>52.064688</latitude>
<range>1000.000000</range>
</LookAt>
</Placemark>
</kml>


Now we need to get this information into the database, the first step is to download the file using HTTPURITYPE and after that we will not store the XML file, we will query it in the same statement that downloads the file and extract the information by making use of EXTRACTVALUE. This will result in a example query as below:

SELECT
EXTRACTVALUE(value(googleData), '/Placemark/name', 'xmlns="http://earth.google.com/kml/2.0"') as "NAME"
,EXTRACTVALUE(value(googleData), '/Placemark/address', 'xmlns="http://earth.google.com/kml/2.0"') as "ADDRESS"
,EXTRACTVALUE(value(googleData), '/Placemark/styleUrl', 'xmlns="http://earth.google.com/kml/2.0"') as "STYLURL"
,EXTRACTVALUE(value(googleData), '/Placemark/Point/coordinates', 'xmlns="http://earth.google.com/kml/2.0"') as "COORDINATES"
,EXTRACTVALUE(value(googleData), '/Placemark/LookAt/longitude', 'xmlns="http://earth.google.com/kml/2.0"') as "LONGITUDE"
,EXTRACTVALUE(value(googleData), '/Placemark/LookAt/latitude', 'xmlns="http://earth.google.com/kml/2.0"') as "LATITUDE"
,EXTRACTVALUE(value(googleData), '/Placemark/LookAt/range', 'xmlns="http://earth.google.com/kml/2.0"') as "RANGE"
FROM
TABLE(
XMLSEQUENCE(
EXTRACT(
HTTPURITYPE(
'http://maps.google.com/maps?f=q&'||'source=s_q&'||'geocode=&'||'q=netherlands,Utrecht,papendorpseweg+100&'||'ie=UTF8&'||'output=kml'
).getXML()
,'/kml/Placemark', 'xmlns="http://earth.google.com/kml/2.0"'
)
)
) googleData;


This query itself is not very useful for the solution I proposed however if you add this query to a package that will do a lot more like getting a list of addresses and adding the address to the query as a variable you will be able to geocode lots of addresses directly from your database by making use of Google without any outside scripts. One thing you have to remember is that you will need access to outside locations, you might retrieve a error like "ORA-24247: network access denied by access control list (ACL)". In this case you will have to make sure you have set the access control list correct for the user you are using to execute the query. You will need to be able to lookup maps.google.com and earth.google.com . To set the access control list correct you can have a read in this blogpost I have been writing some time ago.

No comments: