Correct me if I'm wrong.
There are three approaches to get the nearest homes, users have created in my website:
- To create a table with two columns(latitude, longitude) that both of them are float and say:
Here it is:
$latitude = 50;
$longitude = 60;
SELECT * FROM my_table
WHERE (latitude <= $latitude+10 AND latitude >= $latitude-10)
AND (longitude <= $longitude+10 AND longitude >= $longitude-10)
that 10 here means 1km for example.
In this approach we can also use harvesine formula.
To merge those columns(latitude, longitude) to one column named point as POINT type and again search each row one by one.
To categorize multiple points(the coordinates of homes users have created) as a category for one section of a country i.e. city and if a query comes with $latitude and $longitude to see the nearest homes, I will check in which category they are stored IN ORDER NOT TO search all rows but search only the section this query(coordinate) belongs to.
As I guess approach number 1 is slow because of the conditions for each row of table and again slow if I use harvesine formula.
If I use ST_Distance it seems again it's slow because again it just has lots of calculations.
But if I use approach number 3 it seems it is faster to check each section for an specific point user is than check all rows. I know how to set point for each home however I don't know how to create multiple home positions as a section maybe in another table.
BTW in new versions of MySQL and MariaDB Spatial Indexes are supported in InnoDB.
My questions:
Is approach number 1 really slow or other ST_* functions are the same as this approach to check all rows with those formulas mentioned there one by one? Which one is faster?
Does approach number 2 do something other than simple conditions to make it faster? I mean does it make any changes when using type of POINT instead of float and using ST_* functions instead of doing it myself? I want to know whether the algorithm is different.
If approach number 3 is the fastest in these three approaches, how can I categorize points in order not to search all rows in a table?
How can I use Spatial Indexes to make it as fast as possible?
If any other approaches exist and I didn't mention, could you please tell me how can I get the nearest homes just by having coordinates in MySQL/MariaDB in PHP/Laravel?
Thanks All
Which formula you use for the distance doesn't matter much. What matters much more is the number of rows which you have to read, process and sort. In best case you can use an index for a condition in the WHERE clause to limit the number of processed rows. You can try to categorize your locations - But it depends on the nature of your data, if that is going to work well. You would also need to find out which "category" to use. A more general solution would be to use a SPATIAL INDEX and the ST_Within() function.
Now let's run some tests..
In my DB (MySQL 5.7.18) I have the following table:
The data comes from Free World Cities Database and contains 3173958 (3.1M) rows.
Note that
geoPointis redundant and equal toPOINT(longitude, latitude).Concider the user is located somewhere in London
and you want to find the nearest location from the
citiestable.A "trivial" query would be
The result is
Execution time: ~ 4.970 sec
If you use the less complex function
ST_Distance(), you get the same result with an execution time of ~ 4.580 sec - which is not so much difference.Note that you don't need to store a geo point in the table. You can as good use
(point(c.longitude, c.latitude)instead ofc.geoPoint. To my surprise it is even faster (~3.6 sec forST_Distanceand ~4.0 sec forST_Distance_Sphere). It might be even faster if I didn't have ageoPointcolumn at all. But that still doesn't matter much, since you don't want the user to wait so log for a respose, if you can do better.Now let's look how we can use the SPATIAL INDEX with
ST_Within().You need to define a polygon which will contain the nearest location. A simple way is to use ST_Buffer() which will generate a polygon with 32 points and is nearly a circle*.
The result is the same. The execution time is ~ 0.000 sec (that's what my client (HeidiSQL) says).
* Note that the
@radiusis notated in degrees and thus the polygon will be more like an ellipse rather than a circle. But in my tests I always got the same result as with the simple and slow solution. I would though investigate more edge cases, before I use it in my production code.Now you need to find the optimal radius for your application/data. If it's too small - you might get no results, or miss the nearest point. If it's too big - you might need to process too many rows.
Here some numbers for the given test case: