For the attention of last week

archive of tokumine.com

Javascript for Rubyists

“JavaScript is the only language out there that people think they can program without actually learning it.” – D.Crockford

We’ve all used JavaScript. At least, we’ve all used jQuery to manipulate some part of the DOM. Maybe we think we understand the syntax enough, so we ‘improvise’ when we’re forced to write some. Maybe we’ve tried it a few years ago and ran screaming from this seemingly broken language.

Read the rest of this entry »

Pool implementation for redis/nodejs

Redis, meet node.js. You’ll get on well together.

This gist makes a pool of redis connections for your node.js application, allowing you to create 1 pool per redis database.

Depends on node_redis and node-pool. Code after the jump.

Read the rest of this entry »

10 minute lightning talk on Mapnik and node-mapnik

I gave a lightning talk last week about making maps using Mapnik and node-mapnik, even managing to slip in a demo 🙂

Here are the slides:

Trigram inverted search index generator/client in Ruby + JS client

I needed a high speed autocomplete dropdown box in some recent work that couldn’t depend on an external service, and that had to be faster than regex parsing of the search corpus.   We needed an autocomplete that you could embed in a static webpage.

Following on my recent algorithmic explorations, I implemented a trigram inverted search index generator and client in Ruby and JS. You can test out the results in the demo below.

It was pretty good fun (and simple!) to learn about the wonderful world of n-grams, and how darn useful they are. Also, as I basically implemented the algorithm based on the information at wikipedia, it really solidified my stance on software patents.

Demo: http://tokumine.github.com/trigram_search/

Repository: https://github.com/tokumine/trigram_search

Efficient detection of polygon intersections in Javascript with sweeplines

I needed to be able to detect complex polygon intersections in the browser, so I spent some time exploring and implementing the Bentley–Ottmann sweep line algorithm for detecting crossings in a set of line segments in Javascript. It uses an AVL binary tree and event queue to run in O((n + k) log n) time.  The code on Github is developed to be run on node.js, but it can be easily adapted to run in a browser.

Sweepline Repository on github

UK 3G data coverage in national parks and protected areas

Understanding 3G data coverage in remote areas is pretty important for environmentalist geeks, as typically our mobile apps will target use outside of urban, habited areas. To ensure our apps work, we have to know how much we can rely on the internet and cloud based data processing and storage services through mobile 3G data services. Ideally we’d have some numbers too.

Last Thursday, I presented some back-of-a-napkin GIS work on 3G data coverage in UK national parks and protected areas as part of the Symposium on Mobile & Conservation at the Oxford University Biodiversity Institute. I’ve written up the methods below, but the take home message is that ~18% of UK protected areas have 3G data coverage, assuming Vodaphone has a similar pattern of network distribution across the UK to other mobile operators.

Read the rest of this entry »

Postgres 9.0.1, hstore, PostGIS 1.5.2, GEOS 3.2.2 & GDAL 1.7 on Ubuntu 10.04 Lucid

Ubuntu 10.04 / Lucid is the latest long term release that most of us will be using for our server deployments for now. Unfortunately, it was released just before the latest big releases in the FOSS GIS world: Postgres 9 and PostGIS 1.5.

Thankfully, it’s pretty simple to install these latest versions. Here is quick rundown of the steps needed to install a great OSS server-side GIS stack with all these new toys using easy to remove .deb packages. Tested on EC2 with the latest stock 10.04 server AMI (ami-60067832).

  • Postgresql 9.0.1 + hstore NoSQL columns
  • PostGIS 1.5.2
  • GEOS 3.2.2
  • Proj 4.7
  • GDAL 1.7.2
  • Spatialite 2.4 RC4

All components with test PostGIS database example

sudo add-apt-repository ppa:pitti/postgresql
sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
sudo apt-get update
sudo apt-get install -y postgresql-9.0 postgresql-server-dev-9.0 postgresql-contrib-9.0 proj libgeos-3.2.2 libgeos-c1 libgeos-dev libgdal1-1.7.0 libgdal1-dev build-essential libxml2 libxml2-dev checkinstall

wget http://postgis.refractions.net/download/postgis-1.5.2.tar.gz
tar zxvf postgis-1.5.2.tar.gz && cd postgis-1.5.2/
sudo ./configure && sudo make && sudo checkinstall --pkgname postgis-1.5.2 --pkgversion 1.5.2-src --default

sudo su postgres
createdb -U postgres test_gis
createlang -dtest_gis plpgsql
psql -U postgres -d test_gis -f /usr/share/postgresql/9.0/contrib/hstore.sql  #<= hstore bootstrap, thanks to Paul Smith
psql -U postgres -d test_gis -f /usr/share/postgresql/9.0/contrib/postgis-1.5/postgis.sql
psql -U postgres -d test_gis -f /usr/share/postgresql/9.0/contrib/postgis-1.5/spatial_ref_sys.sql
psql -U postgres -d test_gis -c"select postgis_lib_version();"
exit

If you just want PostgreSQL 9.0

sudo add-apt-repository ppa:pitti/postgresql
sudo apt-get update
sudo apt-get install postgresql-9.0

Thanks to James DeMichele & Brent Wood for feedback on the original (incorrect) article.

Web GIS data payload benchmarks

Hardware Accelerated web GIS

Vector rendering speeds in browsers are set to go through the roof in 2010 thanks to 2D/3D Hardware Acceleration. Though the demos shown so far mainly target 3D games, this next step of browser evolution is a huge deal for the web based GIS/mapping community, which to date have been limited by poor browser performance with vector graphics. Yup, expect to see the words “revolution”, “lightning fast” et al. in a lot of GIS marketing materials (will make a nice change from “geowiki”).

What will these apps render?

Data, and lots of it. You are going to be able to handle vastly larger amounts of data on the client than the current generation of browsers based web apps.

It’s a pretty safe bet that data transport efficiency will become increasingly important to the performance of your future web GIS application.

What’s the best way to send GIS data to the browser for render?

I took a look at the relative payloads sizes of some of the most often used simple lossless web GIS data transport formats and what effect Gzip’ing (a common feature of most modern webservers) and MessagePack object serialization had on them:

  • GeoJSON
  • KML
  • GML
  • SVG
  • EWKT
  • EWKB

For test data, I used the country boundary data of all 200+ countries found in the UN country boundary dataset we use at UNEP-WCMC, recording byte size of final payload.

GIS data payload sizes

Average GIS data payload sizes in bytes (smaller is better)

GIS payload data sizes (smaller is better)

Average GIS payload data sizes in bytes (smaller is better)

Key points

1) There are big differences in the size of uncompressed data. WKB is by far the smallest.
2) MessagePack brings GeoJSON style hash datastructures to near WKB sizes if Gzip is not possible.
3) Gzipping levels the playing field between the formats, and makes a huge difference.
4) When Gzipped, WKT, not WKB, has the smallest payload by about 20%
5) GeoJSON is the most bulky of all the formats when Gzipped (even over K/GML!)
6) The effect of MessagePack on Gzipped payloads is minimal.
7) MessagePack is only effective for native data structures, not string compression.

GeoJSON isn’t the smallest, no clear winner

Though (somewhat shockingly) the most bulky of the formats tested when Gzipped, GeoJSON probably offers the best current development experience due to the abundance of parsers, human readability and toolchain support in exchange for a very small penalty in payload size. The only caveat is the dependence on GZip, which may not be possible depending on traffic.

If you wanted to eek out the smallest payload possible, Gzipped WKT offers the best choice (a shame that any time saved would probably be lost in the parsers). A future benchmark could also include serialisation/deserialisation times to the mix.

Without Gzip, MessagePacking JSON datastructures appears to be a very interesting alternative, providing similar payload sizes to plain WKB, whilst offering simple and fast serial/deserialisation.

Keep an eye on SVG

Although looking at SVG makes me want to run screaming into the hills, I’d be hard pushed not to back SVG as a key lightweight GIS data transfer format of the future. Consider:

  • PostGIS can already output your geometry as SVG.
  • SVG payload sizes are smaller than GeoJSON, Gzipped or not.
  • SVG is natively renderable by all the latest browsers.
  • SVG will have an impressive toolchain developed around it.
  • SVG supports metadata.
  • The other formats will all need to be converted into SVG for HW accel rendering, a step which can be skipped.

Last notes on lossy optimisations

I just thought I’d add that if the geometry is not being roundtripped back to the server, overall payload sizes can be further reduced by simplification and snapping geometry to a grid through reducing the precision of coordinates used.

Data used in analysis (no. bytes per payload) is available as a Fusion Table, code is on Github

Handy textmate snippet to convert hash syntax

This is one of those things that I’ve known existed for years, but I’ve always avoided because of a previous irrational fear of regexes.
Read the rest of this entry »

PostGIS manuals for when refractions forget to renew their domain names…

PDF
HTML