Skip to main content

2 posts tagged with "geometry"

View All Tags

Optimizing Spatial Data Queries Using Spatial Index

· 4 min read

banner

This article discusses the inefficient existing implementation and documents the methods attempted to improve it.

Existing Issues

While it wasn't impossible to join tables scattered across multiple databases in a single query, it was challenging...

  1. Is a specific coordinate within area "a"?
  2. Writing join queries was difficult due to tables existing on physically different servers
    1. Why the need for a single query? Due to the large size of the data to be queried, I wanted to minimize the amount loaded into application memory as much as possible.
  3. Since DB joins were not possible, application joins were necessary, resulting in around 24 billion loops (60000 * 40000)
    1. Although processing time was minimized through partitioning, CPU load remained high due to the loops.
  4. Through the migration process of merging physically different databases into one, the opportunity for query optimization was achieved as joins became possible.

Approach

Given that the primary reason for not being able to use database joins had been resolved, I actively considered utilizing index scans for geometry processing.

  • Using PostGIS's GIST index allows for creating a spatial index similar to R-tree, enabling direct querying through index scans.
  • To use spatial indexing, a column of type geometry is required.
  • While latitude and longitude coordinates were available, there was no geometry type, so it was necessary to first create geometry POINT values using the coordinates.

To simulate this process, I prepared the exact same data as in the live DB and conducted experiments.

First, I created the index:

CREATE INDEX idx_port_geom ON port USING GIST (geom);

Then, I ran the PostGIS contains function:

SELECT *
FROM ais AS a
JOIN port AS p ON st_contains(p.geom, a.geom);

Awesome...

Results

Before Applying Spatial Index

1 minute 47 seconds to 2 minutes 30 seconds

After Applying Spatial Index

0.23 milliseconds to 0.243 milliseconds

I didn't prepare a capture, but before applying the index, queries took over 1 minute and 30 seconds.

Let's start with the conclusion and then delve into why these results were achieved.

GiST (Generalized Search Tree)

A highly useful index for querying complex geometric data, the internal structure is illustrated below.

The idea of an R-tree is to divide the plane into rectangles to encompass all indexed points. Index rows store rectangles and can be defined as follows:

"The point we are looking for is inside the given rectangle."

The root of the R-tree contains several of the largest rectangles (which may intersect). Child nodes contain smaller rectangles included in the parent node, collectively encompassing all base points.

In theory, leaf nodes should contain indexed points, but since all index rows must have the same data type, rectangles reduced to points are repeatedly stored.

To visualize this structure, let's look at images for three levels of an R-tree. The points represent airport coordinates.

Level one: two large intersecting rectangles are visible.

Two intersecting rectangles are displayed.

Level two: large rectangles are split into smaller areas.

Large rectangles are divided into smaller areas.

Level three: each rectangle contains as many points as to fit one index page.

Each rectangle contains points that fit one index page.

These areas are structured into a tree, which is scanned during queries. For more detailed information, it is recommended to refer to the following article.

Conclusion

In this article, I briefly introduced the specific conditions, the problems encountered, the efforts made to solve them, and the basic concepts needed to address these issues. To summarize:

  • Efficient joins using indexes could not be performed on physically separated databases.
  • By enabling physical joins through migration, significant performance improvements were achieved.
  • With the ability to utilize index scans, overall performance was greatly enhanced.
  • There was no longer a need to unnecessarily load data into application memory.
  • CPU load due to loops was alleviated.

Reference

Utilizing Ellipsoids on Earth with Kotlin

· 3 min read

Background

earth image reference1

Considering that the Earth is neither flat nor a perfect sphere, but an irregular ellipsoid, there is no perfect formula for quickly and accurately calculating the distance between two points at different longitudes and latitudes.

However, by using the geotools library, you can obtain mathematically corrected approximations quite easily.

Adding Dependencies

To use the Earth ellipsoid in geotools, you need to add the relevant library dependencies.

repositories {
maven { url "https://repo.osgeo.org/repository/release/" }
maven { url "https://download.osgeo.org/webdav/geotools/" }
mavenCentral()
}

dependencies {
...
implementation 'org.geotools:gt-referencing:26.2'
...
}

Writing Code

First, define the coordinates of Seoul and Busan as an enum class.

enum class City(val latitude: Double, val longitude: Double) {
SEOUL(37.5642135, 127.0016985),
BUSAN(35.1104, 129.0431);
}

Next, let's look at a simple usage example through a test code.

class EllipsoidTest {

@Test
internal fun createEllipsoid() {
val ellipsoid = DefaultEllipsoid.WGS84 // Creates an ellipsoid that is as close to the Earth as possible using the WGS84 geodetic system used in GPS

val isSphere = ellipsoid.isSphere // Determines if it is a sphere or an ellipsoid
val semiMajorAxis = ellipsoid.semiMajorAxis // Equatorial radius, the longer radius of the ellipsoid
val semiMinorAxis = ellipsoid.semiMinorAxis // Polar radius, the shorter radius of the ellipsoid
val eccentricity = ellipsoid.eccentricity // Eccentricity, indicates how close the ellipsoid is to a sphere
val inverseFlattening = ellipsoid.inverseFlattening // Inverse flattening value
val ivfDefinitive = ellipsoid.isIvfDefinitive // Indicates if the inverse flattening is definitive for this ellipsoid

// Orthodromic distance
val orthodromicDistance = ellipsoid.orthodromicDistance(
City.SEOUL.longitude,
City.SEOUL.latitude,
City.BUSAN.longitude,
City.BUSAN.latitude
)

println("isSphere = $isSphere")
println("semiMajorAxis = $semiMajorAxis")
println("semiMinorAxis = $semiMinorAxis")
println("eccentricity = $eccentricity")
println("inverseFlattening = $inverseFlattening")
println("ivfDefinitive = $ivfDefinitive")
println("orthodromicDistance = $orthodromicDistance")
}
}
isSphere = false
semiMajorAxis = 6378137.0
semiMinorAxis = 6356752.314245179
eccentricity = 0.08181919084262128
inverseFlattening = 298.257223563
ivfDefinitive = true
orthodromicDistance = 328199.9794919944

You can create an Earth ellipsoid with DefaultEllipsoid.WGS84. If you use SPHERE instead of WGS84, a sphere with a radius of 6371km will be created.

The distance result is in meters (m), so converting it to kilometers shows approximately 328km. If you search on Google, you may find 325km, so considering that there may be differences between the coordinates I chose and those chosen by Google, this is not a bad figure.

There are many other functions available as well. However, covering them all in this post would be too extensive, so if needed, I will cover them in another post.

info

The margin of error may not be satisfactory depending on business requirements, so before actual implementation, make sure to thoroughly test other methods in geotools.


Footnotes

  1. An Overview of SRID and Coordinate System