SQL Server’s spatial capabilities have evolved significantly since their introduction in the 2008 release. Given the well documented beta (CTP), the March RTM, and April release, there has been a great deal of coverage of the new spatial features in SQL Server 2012. (See great posts by Alastair Aitchison and Lenni Lobel, and be sure to check out Ed Katibah’s blog. My favorite summary is a 12 minute video by Greg Low.)
I’d like to reflect on the six spatial enhancements to SQL Server 2012 that are of most interest to myself and Safe Software.
My Top 6 Takeaways from SQL Server Spatial 2012
Following the OGC model (like PostGIS), curve support comes with three new types: circular strings (sequences of circular arcs), compound curves (sequences of circular arcs and straight lines), and curve polygons (areas bounded by curved and/or straight segments).
SQL Server 2012 brings three major innovations to the table:
- Full support for existing operations: For example, you can get a curve’s length, or find the shortest line between anywhere on a curve and another geometry.
- A round-earth definition of curves: You can define curves in the geodetic “geography” type – an industry first – and again, all of the spatial operations just work. We see lots of lat/long data with curves inside. With SQL Server, you can now query and manipulate this data without stroking arcs into lines or projecting your data onto a locally-flat coordinate system.
- Buffering with curves: Buffering expands a geometry by a specified distance in every direction – and typically results in lots of round edges. SQL Server 2012 optionally represents the rounds bits as arcs, which makes the results more accurate and compact. As above, this works with both the locally-flat-earth (geometry) and globally-round-earth (geography) types.
The round-earth geography type can now represent shapes that don’t fit inside a hemisphere (very roughly: the ones that can’t be fully seen from any point in space). Taking that idea to the limit, there is also new way to specify “the whole earth”: FULLGLOBE. Bob Beauchemin has a nice pair of posts on the subject.
While there isn’t a lot of data that falls into this case, when you need it, it’s a big deal. It also means you can express concepts like “everywhere except ”, which can make for more expressive queries.
This extra power comes with extra responsibility – the sole difference between “somewhere” and “everywhere else” comes down to coordinate order – so more care is required to get this right.
Creating spatial indices in SQL Server 2008 is hard. Specifically, it’s not obvious how to choose good index parameters (and by good I mean “makes queries faster at all”). This is a big deal for two reasons: (i) For real-world data volumes, you need spatial indices to get reasonable performance, and (ii) If you’re using SQL Azure, you can’t even load spatial data without them.
So I am very excited that SQL Server 2012 lets you skip these parameters (you still need to know the bounds for projected data), and it figures everything out automatically.
One of the biggest hassles with SQL Server 2008’s round-earth geography type is that invalid data could not be loaded at all. SQL Server 2012 lets you load this data and repair it after – a huge convenience.
The new release also adds an improved validation method, IsValidDetailed(), that provides detailed answers to the question “why was my geometry considered invalid?”
As an experiment, I loaded a Shapefile containing country boundaries in lat/long into SQL Server 2008 and 2012. SQL Server 2008 rejects Fiji (because the data breaks it into two pieces along the 180 degree meridian, and multipolygons can’t share a common edge) and Antarctica (for a similar reason, but more generally due to the non-geodetic nature of the input data and issues near the poles). SQL Server 2012 loads all the data, and then provides tools to inspect and repair afterwards. Interestingly, it detects three invalid features, with Russia joining Fiji and Antarctica, perhaps due to precision issues (more on that below).
SQL Server 2012 uses more precision (48 bits instead of 27) for geometric operations, resulting in fewer slivers, less visible rounding of coordinates, and generally better results. In some cases, the improvements are dramatic: In Alastair’s blog, he gives an example of two lines; SQL Server 2008 says they don’t intersect, and SQL Server 2012 says they do. I’ve reproduced his results, and would like to add only one thing: a picture.
At Safe, we’ve spent a phenomenal amount of time battling problems like these, so this result isn’t entirely surprising. I do find it interesting how much easier it is for a person to see the answer, though (at least for this data).
An important aside: The increased precision is about processing, not storage. Coordinates are still stored as IEEE doubles, with their 53 bits of precision. Why do we care? It means the internal representation of existing geometric types (points, lines, …) hasn’t changed.
Relationship to SQL Azure
Today, SQL Server 2012 has more spatial functionality than SQL Azure – Azure is limited to operations that work on the old geometry types. Over time, that role will reverse, with new spatial features debuting in Azure, and eventually making their way to the next major release of SQL Server. This reinforces an ongoing trend – the cloud is getting more important.
Overall, I see SQL Server 2012’s spatial support as a well-thought out, significant evolution from the 2008 release, with improvements for all the major pain points. Are you using SQL Server Spatial? If so, what do you think of the new functionality?
Actually the line intersection issue is not testing whether they cross, but whether the computed intersection point lies on the lines. The increase in precision doesn’t solve this issue (in general, the intersection point of two line segments can’t be represented with finite precision), so it seems like SQL Server might have introduced a tolerance value into their spatial predicates?
I agree completely with your description and that increasing the precision isn’t a panacea. What I don’t know is if the improvement comes from (a) a tolerance value, (b) increased precision – meaning it will still break for other cases, (c) use of infinite precision logic in this particular case, or (d) something else. My only hint is that STIntersects doesn’t take a tolerance value, there isn’t an equivalent method that does, and there isn’t another place to provide one.
Does FME support the new SQL Server 2012 release? (In FME Server 2012)
No, FME 2012 does not support the new SQL Server 2012 additions (e.g., the curve and fullglobe features described in this post). FME can, however, read and write any geometry or geography data that was representable in SQL Server 2008. Concrete examples: You can read a polygon. You cannot read a curve polygon (translation fails). You can write a polygon. You can write a curve polygon, but the curved aspects will be “stroked”, i.e. turned into lines.
We are working on supporting the new features in FME 2013. If you would like to be notified when support is available in an FME 2013 beta, feel free to contact firstname.lastname@example.org and quote “PR 29754”. Or, if you prefer, you can write me directly at email@example.com.
Following up on the comments about “precision”. First, Martin’s comment above is right on. No matter how many precision bits you have, you will still need to resort to an exact arithmetic package for getting, for example, an in-circle test or a which-side-of-the-line-is-the-point-on test to be always correct. Second, in the geo-spatial world we mainly care about coordinates xyz and a few discrete attributes, but rarely about the kind of precision needed in numerical simulations. In fact, for xyz coordinate information and alike, a floating-point representation provides less precision for the same number of bits than a scaled integer representation does. I go on about this during the first five minutes of this video on point compression: http://www.youtube.com/watch?v=A0s0fVktj6U
I 100% agree with your comments. Any finite precision will get some of these questions wrong. Martin Davis alludes to one potential solution above – using a tolerance value – but I suspect there are a few ways to skin that cat. With regard to floating-point vs. scaled integers, there are many GIS packages that use scaled integers for exactly this reason. Your video explains the issue well – thanks for the link. Floating-point approaches have their benefits too: You don’t need to know the bounds of your data ahead of time, and no lossy conversions are needed to/from Well Known Binary, the most common interchange format for vector geometry. The price: 8 bytes for 15 digits of precision (your video clearly shows why four bytes each may not be enough). It’s interesting how data volumes can change our perspectives about what’s important!
[…] noted previously, only SQL Server models round-earth curves. Is that a […]