I have a fairly small dataset (only 138 observations) that are all very close to each other geographically. I would like to create clusters from this data.
The requirements fro my clusters would be:
- hopefully between 10-12 clusters overall of around 10-12 locations in each
- within a geographical distance of say 5km-10km from each other
I have tried to techniques
KMeans with the number of clusters set to 12. This gave decent results, but I also know that KMeans is not right for lat/long data.
I am trying dbscan, but I think I am hitting an issue where the dataset is simply too small. This only outputs one cluster group for all of my observations.
In this paper I see it mentioned that dbscan gives nonsensical results in such a small set.
I want to create these clusters with a better approach than KMeans and want to know if I am missing something in my code or if there is a better approach to take here
from sklearn.cluster import DBSCAN
from geopy.distance import great_circle
from shapely.geometry import MultiPoint
import pandas as pd
import numpy as np
coords = pd.read_csv("locs.csv")
coords1 = coords.to_numpy()
kms_per_radian = 6371.0088
max_distance = 5
epsilon = max_distance/kms_per_radian
db = DBSCAN(eps=epsilon, min_samples=5, algorithm='ball_tree', metric='haversine').fit(np.radians(coords1))
cluster_labels = db.labels_
num_clusters = len(set(cluster_labels))
clusters = pd.Series([coords[cluster_labels == n] for n in range(num_clusters)])
print('Number of clusters: {}'.format(num_clusters))
here is my data - sorry if it shows up sloppily. it getsread in a locs.csv here
lat long
33.1176813 -96.6889152
33.1311714 -96.6673619
33.0960563 -96.6961237
33.1453321 -96.6842826
33.1391395 -96.6758389
33.0971504 -96.6870497
33.1262443 -96.6878928
33.1143839 -96.6945359
33.114209 -96.6945359
33.1149433 -96.6927684
33.109267 -96.6937003
33.1331405 -96.6666864
33.1383939 -96.6680508
33.1472743 -96.688812
33.1057274 -96.680582
33.1368991 -96.6796421
33.107386 -96.678968
33.150424 -96.6965207
33.1279273 -96.6806465
33.1087588 -96.6876371
33.1262534 -96.6742925
33.098562 -96.6853026
33.1125942 -96.6883732
33.1102927 -96.6961864
33.1035043 -96.6955179
33.1243762 -96.6788757
33.1291342 -96.677637
33.1156607 -96.6858248
33.093478 -96.6997758
33.1260525 -96.6890209
33.0916379 -96.6965207
33.1283669 -96.6724685
33.1033839 -96.6874843
33.1483239 -96.6828948
33.157496 -96.684684
33.0956047 -96.6976071
33.0956047 -96.6976071
33.1057274 -96.680582
33.1574837 -96.6850944
33.1582715 -96.6965416
33.1386326 -96.6783054
33.158112 -96.684861
33.138768 -96.675811
33.098562 -96.6853026
33.1030668 -96.6955179
33.107386 -96.678968
33.1379475 -96.6698679
33.1117192 -96.6878473
33.0926369 -96.691028
33.092519 -96.6948466
33.1542432 -96.6823992
33.1542432 -96.6823992
33.1330701 -96.6743789
33.1309656 -96.6879346
33.1469985 -96.6874333
33.1454754 -96.6839861
33.1261935 -96.6725202
33.1566998 -96.6841281
33.1566998 -96.6841281
33.107347 -96.6921094
33.107147 -96.6817192
33.1081097 -96.6880473
33.1243427 -96.6773343
33.1294931 -96.685219
33.1089024 -96.6894143
33.1348689 -96.6686227
33.125196 -96.6825663
33.1239856 -96.6892297
33.1549715 -96.6965207
33.1033242 -96.6887841
33.098562 -96.6853026
33.1360933 -96.67346
33.1081031 -96.6886583
33.1552268 -96.69299
33.1323984 -96.6658496
33.1262448 -96.6740307
33.1257552 -96.6869133
33.1257552 -96.6869133
33.1143839 -96.6945359
33.126066 -96.692029
33.1374841 -96.6677798
33.1405272 -96.6739612
33.1129799 -96.6893467
33.1320952 -96.6811459
33.1239267 -96.6899191
33.1252 -96.6837151
33.1033242 -96.6887841
33.1123512 -96.6774962
33.103736 -96.6867271
33.094192 -96.7010288
33.1392404 -96.6783889
33.1527167 -96.683548
33.1129669 -96.6826289
33.1193324 -96.6887284
33.1519071 -96.6921126
33.1358899 -96.6758085
33.1358899 -96.6758085
33.0987211 -96.6966042
33.1226774 -96.6762474
33.0968079 -96.6967923
33.1393173 -96.6642923
33.108696 -96.6970012
33.1078495 -96.6846551
33.1243762 -96.6788757
33.1243762 -96.6788757
33.1518825 -96.6811668
33.1239576 -96.6921962
33.1483239 -96.6828948
33.1125542 -96.6792695
33.1517525 -96.6835898
33.1145307 -96.6889703
33.1235157 -96.6751516
33.1549715 -96.6965207
33.1041763 -96.6868112
33.1352762 -96.6691161
33.1311522 -96.6726246
33.1526416 -96.6857622
33.1391395 -96.6758389
33.114209 -96.6945359
33.1343714 -96.6649288
33.1243142 -96.6937003
33.1343154 -96.6788902
33.0914204 -96.6888572
33.1087365 -96.6946195
33.1087365 -96.6946195
33.1123452 -96.6715541
33.1453321 -96.6842826
33.1573651 -96.6865559
33.1302622 -96.6730521
33.1549715 -96.6965207
33.1266383 -96.6807078
33.1091322 -96.6884788
33.114369 -96.6825872
33.1512367 -96.68994
33.126426 -96.6910263
33.1117173 -96.6940972
33.1117173 -96.6940972
33.1061857 -96.6810057