I have a set of N 3D points (these are the vertices of a closed line). Most of them are part of the same plane, whereas a few are lightly shifted. I need to figure out the plane which naturally already includes the maximum number of points, and to project the remaining points (i.e. the shifted ones) onto it.
To do so, I iterate through all triplets of points; (0,1,2), then (1,2,3),... until (n,0,1). For each step, I build a plane passing through these 3 points and I compute the distances from that plane to all other points.
This gives me a matrix of distances D[i,j] = d where i = the index of the first plane (i.e. the index of the first point of the triplet forming that plane), and j the index of any other point from the point set at the distance d of the plane i. If that distance d between a point and the plane is 0, that means the point is already on the plane.
Here is an initial set of points:
array([[ 8.563 , 8.2252, 18.6602],
[ 8.563 , 8.2252, 22.3125],
[ 11.7319, 1.729 , 22.3125],
[ 11.7319, 1.729 , -19.207 ],
[ 8.084 , 9.207 , -19.207 ],
[ 8.084 , 9.207 , 18.6602],
[ 8.563 , 8.2252, 18.6602]])
# the resulting distance matrix:
distance_matrix = array([
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -6.27855770e-05, -6.27855770e-05],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -6.27855770e-05, -6.27855770e-05],
[ 5.45419753e-05, 5.45419753e-05, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ 5.45419753e-05, 5.45419753e-05, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, -4.15414572e-04,-4.15414572e-04, 0.00000000e+00, 0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, 4.15414572e-04, 4.15414572e-04, 0.00000000e+00, 0.00000000e+00]])
Each green block corresponds to the triplet of points defining the plane of row i.
At this stage, I pick one (of the) point with the highest distance to a plane, i.e. at position (4,2), and decide to project this point (4) onto that plane (2) in order to replace it in the original array. Then I redo the computation of the planes and distances.
After that first iteration, with the new coordinates of that point, the distance matrix changes to something really close to zero (because the points were initially all supposed to be part of the same plane):
array([
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 4.65661287e-10, 4.65661287e-10],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, -4.65661287e-10, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]])
but it is stuck there... I mean, it never changes more than what you can see in array, whatever the number of iterations, and this number, here, -4.65661287e-10 is not small enough to be considered as a 0, therefore, my set of point is never seen as part of the same plane by third party tools.
I'm wondering if this number has some special meaning or what, and if there is something I can do to lower it?
Here is the current code, it's a bit drafty but it should work:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Dec 13 11:35:48 2022
@author: s.k.
LICENSE:
MIT License
Copyright (c) 2022-now() s.k.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Except as contained in this notice, the name of the copyright holders shall
not be used in advertising or otherwise to promote the sale, use or other
dealings in this Software without prior written authorization from the
copyright holders.
"""
import numpy as np
from shapely import wkt
from shapely.geometry import Polygon, MultiLineString
def find_plane(points):
"""Find the coefficients (a,b,c,d) satisfying the plane equation:
ax + by + cz + d = 0 given a 3x3 array of row-wise points.
"""
# I need to to that otherwise the feeded numpy array get modified!!!:
pts = points.copy()
p0 = points[2,:].copy()
center = np.average(pts, axis=0, keepdims=True)
pts -= center # reduce large coordinates
u, v = pts[1:,:] - pts[0,:]
n = np.cross(u, v)
n_unit = n / np.linalg.norm(n)
d = -1 * (p0 @ n_unit)
return (np.append(n_unit, d), center)
def closest_distance_to_plane(points,plane):
if len(points.shape) == 1:
points = points.reshape(1,len(points)) # reshape 1dim vectors to 2D
nbpts, lp = points.shape
# here we can work with homogeneous coordinates
points = np.append(points, np.ones(nbpts).reshape(nbpts,1), axis=1)
dists = points @ plane
return dists
def project_points_on_plane(points, plane, pt_on_plane):
new_points = None
if len(points.shape) == 1:
points = points.reshape(1,len(points)) # reshape 1dim vectors to 2D
nbpts, lp = points.shape
n = plane[:-1]
new_points = points - ((points - pt_on_plane) @ n) * n
return new_points
def get_distances_to_planes(points):
lp = np.size(points,0)
shift = 2
p = np.append(points, points[:shift,:], axis=0)
planes = []
dists = np.zeros((lp, lp), dtype=np.double)
for i in range(lp): # loop over planes
include_idx = np.arange(i,i+3)
mask = np.zeros(lp+shift, dtype=bool)
mask[include_idx] = True
plane, pt_on_plane = find_plane(p[mask,:])
planes.append(plane)
mask2 = mask.copy()
if i > 1:
mask2[:shift] = mask2[-shift:]
mask2 = mask2[:-shift]
for j, pt in enumerate(p[:-shift]): # loop over remaning points
if ~mask2[j]:
dist = closest_distance_to_plane(pt, plane)
dists[i,j] = dist
return dists
def clean_plane(wkt_geom):
k = 1
dists = np.array([1])
new_geom = wkt_geom
P = wkt.loads(new_geom)
# remove last point as it's a duplicate of the first:
p = np.array(P.geoms[0].coords)[:-1]
lp = np.size(p,0)
dists = get_distances_to_planes(p)
max_dists = np.max(np.abs(dists))
print(f"max_dists init: {max_dists}")
while max_dists != 0 and k <= 20:
print(f"Iter {k}...")
idx_max_sum = np.argwhere(dists == np.amax(dists))
planes_max, pts_max = set(idx_max_sum[:,0]), set(idx_max_sum[:,1])
# pick only the first plane for the moment:
plane_idx = list(planes_max)[0]
include_idx = np.arange(plane_idx, plane_idx+3)
include_idx = include_idx%lp
mask = np.zeros(lp, dtype=bool)
mask[include_idx] = True
# TODO: verify for singularities here:
plane, pt_on_plane = find_plane(p[mask,:])
for pt_max in pts_max:
p[pt_max] = project_points_on_plane(p[pt_max], plane, pt_on_plane)
new_geom = Polygon(p)
dists = get_distances_to_planes(p)
max_dists = np.max(np.abs(dists))
print(f"max_dists: {max_dists}")
k += 1 if max_dists != 0 else 21
return new_geom.wkt
wkt_geom = '''MULTILINESTRING Z ((
2481328.563000001 1108008.2252000012 58.66020000015851,
2481328.563000001 1108008.2252000012 62.312500000349246,
2481331.731899999 1108001.7289999984 62.312500000349246,
2481331.731899999 1108001.7289999984 20.79300000029616,
2481328.083999999 1108009.2069999985 20.79300000029616,
2481328.083999999 1108009.2069999985 58.66020000015851,
2481328.563000001 1108008.2252000012 58.66020000015851
))'''
clean_plane(wkt_geom)
It should print:
max_dists init: 0.00041541503742337227
Iter 1...
max_dists: 4.656612873077393e-10
Iter 2...
max_dists: 4.656612873077393e-10
Iter 3...
max_dists: 4.656612873077393e-10
Iter 4...
max_dists: 4.656612873077393e-10
Iter 5...
max_dists: 4.656612873077393e-10
Iter 6...
max_dists: 4.656612873077393e-10
Iter 7...
max_dists: 4.656612873077393e-10
Iter 8...
max_dists: 4.656612873077393e-10
Iter 9...
max_dists: 4.656612873077393e-10
Iter 10...
max_dists: 4.656612873077393e-10
Iter 11...
max_dists: 4.656612873077393e-10
Iter 12...
max_dists: 4.656612873077393e-10
Iter 13...
max_dists: 4.656612873077393e-10
Iter 14...
max_dists: 4.656612873077393e-10
Iter 15...
max_dists: 4.656612873077393e-10
Iter 16...
max_dists: 4.656612873077393e-10
Iter 17...
max_dists: 4.656612873077393e-10
Iter 18...
max_dists: 4.656612873077393e-10
Iter 19...
max_dists: 4.656612873077393e-10
Iter 20...
max_dists: 4.656612873077393e-10
This is the array containing the initial i planes with parameters a,b,c,d (=columns of the array):
[array([ 8.98767249e-01, 4.38426085e-01, -0.00000000e+00, -2.71591655e+06]),
array([ 8.98767249e-01, 4.38426085e-01, 0.00000000e+00, -2.71591655e+06]),
array([ 8.98763941e-01, 4.38432867e-01, 0.00000000e+00, -2.71591586e+06]),
array([ 8.98763941e-01, 4.38432867e-01, 0.00000000e+00, -2.71591586e+06]),
array([ 8.98742050e-01, 4.38477740e-01, -0.00000000e+00, -2.71591126e+06]),
array([-8.98742050e-01, -4.38477740e-01, 0.00000000e+00, 2.71591126e+06])]

With one call to
lstsqyou get what, for many purposes, is a perfect fit:Is projection even necessary? Matrix multiplication of an augmented xyz1 matrix with [1, p0, p1, p2] should yield zeros, and in actuality yields
This is without your replace-and-iterate algorithm. Given that your input values are all rounded to four decimals after the point, most of the error shown above is quite possibly a product of rounding error at the input; but it's impossible to say for sure without having more information about the problem.
Note further that your plane of fit has parameters
[ 4.87814319e-01 -7.85335125e-07 -1.25753338e+01]So "x" can be fully determined by "y", and "z" has negligible effect. Geometrically, this means: calling this a 3D problem is sort of a lie, because the input locus exists almost perfectly in a 2D plane parallel to the "z" axis, basically equivalent to a line in xy space.
If you want to eliminate your worst outlier, fine; but I doubt it will be possible to reduce your error to exactly 0. For me to reproduce your results you would need to post the input arrays at full precision and not truncated along with your code. In fact, when I eliminate your worst outlier and ignore "z", I get error five orders of magnitude lower than yours: