Introduction
As a nationwide app and platform, ParkMobile handles a lot of location-based searching. Our applications and services need the ability to quickly look up and display location-based information to users at scale. This is a common problem in map-based applications, but it is an especially difficult problem for a company with over 20 million users. Luckily, there is a simple and efficient method to make any location searchable in any database: Geohashing.
What is Geohashing?
The Geohash system is a public domain geocode system that offers a systematic way to subdivide coordinates on the surface of the earth with varying resolutions. A Geohash encodes a geographic location into a short string of letters and digits. For example, let’s consider a prominent landmark near ParkMobile’s main office–Georgia Tech’s Klaus Computing building. A Geohash for this building is dn5bpsb
. This hash is broad and includes most of the building, but we could use a different hash to be more specific. To specify the building’s entrance steps, we could use the Geohash dn5bpsbv6
. The first Geohash is accurate to approximately 150 meters, but the second is accurate to approximately 1 meter!
Why is the second Geohash for Georgia Tech’s Klaus Computing building more accurate? Upon closer inspection, you may notice two things.
- The more accurate Geohash is longer
- The more accurate Geohash includes all of the characters in the less accurate Geohash
We can see that the second Geohash is more accurate because it is adding additional characters, which specify a sub-section of the first Geohash. In other words, the second Geohash is actually the first Geohash, but with a higher precision.
Geohash Precision By Length
Geohash Length |
Lat Bits |
Lng Bits |
Lat Error |
Lng Error |
Km Error |
---|---|---|---|---|---|
1 |
2 |
3 |
±23 |
±23 |
±2500 |
2 |
5 |
5 |
±2.8 |
±5.6 |
±630 |
3 |
7 |
8 |
±0.70 |
±0.70 |
±78 |
4 |
10 |
10 |
±0.087 |
±0.18 |
±20 |
5 |
12 |
13 |
±0.022 |
±0.022 |
±2.4 |
6 |
15 |
15 |
±0.0027 |
±0.0055 |
±0.61 |
7 |
17 |
18 |
±0.00068 |
±0.00068 |
±0.076 |
8 |
20 |
20 |
±0.000085 |
±0.00017 |
±0.019 |
At first glance, the Geohash’s length-based precision may seem like an unimportant concept. However, a second look at this system reveals a powerful feature of Geohashing–less precise Geohashes can be used to quickly narrow a search to broad regions. At ParkMobile, we use this paradigm to perform very fast queries against a database based upon a precise Geohash. To accomplish this, we use less precise Geohashes as database keys. When a user performs a search with a precise latitude and longitude, we quickly convert those coordinates to a precise Geohash, parse out the prefix of a given location, and use that prefix to query against a broader regions in our database. This approach is extremely flexible, and it is not exclusive to databases. In fact, caching systems, as well as any other systems using key/value pairs for locations, can utilize this strategy.
Geohashing Limitations and Workarounds
Prefix Length
Although Geohashing has a lot of benefits, there are a few limitations as well. The most obvious limitation is in choosing a prefix length. It can be difficult to decide on the best prefix length, and your software system will be tightly coupled to the length that is chosen, so choose wisely. To mitigate this issue, ParkMobile utilizes multiple prefix lengths for our Availability systems. This helps us make our applications faster while also maintaining some precision when necessary.
For ParkMobile’s Availability System Geohash caching strategy, we use a prefix length of 7. This prefix length may seem unnecessarily precise, but a shorter prefix length would lack necessary precision and cause bad caching artifacts for a user’s view. Of course, the downside to this extra precision is that ParkMobile’s systems store redundant and overlapping data within each key/value pair for the cache. However, computing memory is cheap, and the system’s performance savings significantly outweigh the additional memory storage costs. By utilizing Geohash-based keys for caching implemented, we were able to improve our Inrix Availability system’s response time from ~500 ms/req on average to ~70 ms/req on average, a 700% improvement!
Neighbors
For ParkMobile’s CivicSmart Geohashing implementation, we use a more advanced strategy involving multiple prefix lengths. First, for Geohash queries we start by using a prefix length of 3 to filter out large portions of the map. Then, we use DynamoDB’s filter feature to narrow queries down further to a prefix length of 5. Finally, we use another powerful Geohashing feature to search all of the neighboring Geohashes for our more precise prefix.
Geohashing’s neighbors feature is quite powerful, and it enables us to find up to 8 neighboring prefixes for our CivicSmart search. For example, the Klaus Computing building’s Geohash (dn5bpsbv6
) has the following direct neighbors: dn5bpsbv9
, dn5bpsbvd
, dn5bpsbve
, dn5bpsbv3
, dn5bpsbv7
, dn5bpsbv1
, dn5bpsbv4
, dn5bpsbv5
. Utilizing these neighbors in our search gives us a much more precise area around a prefix and helps mitigate one of the key limitations of Geohashing.
Fixed Boundaries
There is another Geohashing limitation with larger than expected consequences–Geohashes do not always conform to geographical landmarks. Instead, Geohash prefixes have fixed boundaries. For example, the Atlanta Botanical Garden cannot be fully captured by a length 5 prefix. This means that it is split down the middle by the hashes: dn5bp
and dnh00
.
This poses a potential problem, but we can once again use Geohashing’s neighbor feature to get around this limitation.
Let’s use a precision length of 6 to capture all neighbors of the Atlanta Botanical Garden. The central prefix is dnh00n
, and the neighbors are dnh00p
, dnh00r
, dnh00q
, dnh00j
, dnh00m
, dn5bpz
, dn5bpy
, and dn5bpv
. Did you notice that some of these neighbors have different prefixes? This means that we can work around the limitation that a geographical location is contained in two separate and larger Geohash prefixes.
Final Thoughts
The Geohash system is a powerful tool for map and location-based applications. It allows applications to perform fast locations-based lookups and can help improve user experience with location-based caching. Geohashing has been an invaluable tool in ParkMobile’s efforts to improve performance of our availability systems.
Resources and Reading:
https://read.acloud.guru/location-based-search-results-with-dynamodb-and-geohash-267727e5d54f