Generate P random N-dimensional points from list of ALL possible pairwise distances

136 Views Asked by At

I would like to generate random N-dimensional points with the constraint of having precise Euclidean distances (known) between each other.

Number of points P = 100 Number of dimensions of the hyperspace N = 512

Consequently, the possible number of pairwise distances is given by the formula L = P*(P-1)/2. If P = 100, then L = 4950.

Let's say I have a list of 4950 distance values, where each value refers to a precise point-point combination.

Is it possible to implement this using numpy?

It is trivial to do it when considering pairs of points (P = 2) as L = 1, but I'm trying to figure out if it can it be generalized to higher values of P?

This is my implementation for P = 2, considering set_dist as the desired distance value.

import numpy as np
from sklearn.metrics.pairwise import euclidean_distances

N = 512

set_dist = 5.

point_0 = np.random.rand(N).reshape(1, -1)
point_1 = np.random.rand(N).reshape(1, -1)
rand_dist = euclidean_distances(point_0, point_1)
point_0 = point_0 * set_dist / rand_dist
point_1 = point_1 * set_dist / rand_dist
2

There are 2 best solutions below

2
btilly On BEST ANSWER

Well, you asked to do numpy.

import numpy as np

I will be using d with d[i][j] being the distance from point i to point j. It needs to be symmetric, with 0 on the diagonal. I will leave it up to you to get from your distance list to that matrix.

Our first task will be to find our points in some coordinate system, and then randomize it. So we'll focus on that.

Suppose we have coordinates with only a few dimensions filled out. For example in 3 dimensions we know only the first coordinates. For all the points, we want to project them to the dimensions that we don't know, and find distances there. Here is a utility function to do that using the Pythagorean Theorem.

def project_distances(d, coords, point, epsilon=10**-10):
    points = d.shape[0]
    point_coords = coords[point].reshape(1, coords.shape[1])
    known_d2 = np.sum(np.square(coords - point_coords), axis=1)
    project_d2 = np.square(d[point]) - known_d2
    # Fix potential roundoff errors.
    project_d2 = np.where(project_d2 < epsilon, 0, project_d2)
    return np.sqrt(project_d2)

Now suppose we've done that projection. How do we find our next coordinate?

Well we'll take point 0 arbitrarily and put it at the origin. And then for a second point we'll find the one that is farthest from it. Using Heron's formula we can find for every point all of the distances not explained by the new dimension. That's what is at right angles to our new dimension. We can use Pythagoras to find the absolute value along the new dimension. And then by comparing with the second point, we can figure out if it is a + or a -.

def distances_to_normalized_coords (d, n, epsilon=10**-10):
    points = d.shape[0]
    # Start with everything 0.
    coords = np.zeros((points, n))
    project_d_from_0 = d[0]
    for dim in range(n):
        # Find the farthest away point to use to orient this dim.
        best_point = 0
        best_d = 0
        for point in range(1, points):
            if best_d < project_d_from_0[point]:
                best_point = point
                best_d = project_d_from_0[point]
        if best_d < epsilon:
            break # We're done.

        project_d_from_best = project_distances(d, coords, best_point, epsilon)
        # Use Heron's formula to find areas of triangles from 0 to best_point, to any point.
        s = (best_d + project_d_from_0 + project_d_from_best) / 2
        area2 = s * (s - best_d) * (s - project_d_from_0) * (s - project_d_from_best)
        area2 = np.where(area2 < epsilon, 0, area2)
        area = np.sqrt(area2)
        # area = base * height / 2. The base is best_d. Find the height.
        height = 2 * area / best_d
        # height is at right angles to this dimension. Find this dimension.
        distance2 = np.square(project_d_from_0) - np.square(height)
        distance2 = np.where(distance2 < epsilon, 0, distance2)
        distance = np.sqrt(distance2)

        # Actual coordinates can be +- distance. Which better eplains project_d_from_best?
        predict_best_d_plus = np.sqrt(np.square(height) + np.square(best_d - distance))
        predict_best_d_minus = np.sqrt(np.square(height) + np.square(best_d + distance))
        column = np.where(np.abs(predict_best_d_plus - project_d_from_best) <= np.abs(predict_best_d_minus - project_d_from_best), distance, -distance)

        # Copy the column to the coordinates.
        for point in range(1, points):
            coords[point][dim] = column[point]

        # And reproject with current coords.
        project_d_from_0 = project_distances(d, coords, 0, epsilon)

    return coords

OK, now we know a single set of points with the right distances. Let's now move it around randomly.

from scipy.stats import ortho_group
def randomly_position(coords, deviation):
    points, n = coords.shape
    center = np.sum(coords, axis=0).reshape(1, n) / points
    rotate = ortho_group(coords.shape[1]).rvs()
    return np.matmul(coords - center, rotate) + deviation * np.random.randn(1, n)

And as an example, let's do a cube.

def cube_corners ():
    for i in (0, 1):
        for j in (0, 1):
            for k in (0, 1):
                yield (i, j, k)

def cube_distances():
    distances = []
    for i1, j1, k1 in cube_corners():
        row = []
        for i2, j2, k2 in cube_corners():
            row.append(((i1-i2)**2 + (j1-j2)**2 + (k1-k2)**2)**0.5)
        distances.append(row)
    return np.array(distances)

distances = cube_distances()
coords = distances_to_normalized_coords(distances, 3)
print(randomly_position(coords, 5))

And if you want to actually check that the answer makes sense.

for i in range(len(distances)):
    for j in range(len(distances[i])):
        print(i, j, distances[i][j], (np.sum(np.square(coords[i] - coords[j])))**0.5)

For debugging purposes, here is the whole program and a little test code.

import numpy as np

def project_distances(d, coords, point, epsilon=10**-10):
    points = d.shape[0]
    point_coords = coords[point].reshape(1, coords.shape[1])
    known_d2 = np.sum(np.square(coords - point_coords), axis=1)
    project_d2 = np.square(d[point]) - known_d2
    # Fix potential roundoff errors.
    project_d2 = np.where(project_d2 < epsilon, 0, project_d2)
    return np.sqrt(project_d2)

def distances_to_normalized_coords (d, n, epsilon=10**-10):
    points = d.shape[0]
    # Start with everything 0.
    coords = np.zeros((points, n))
    project_d_from_0 = d[0]
    for dim in range(n):
        # Find the farthest away point to use to orient this dim.
        best_point = 0
        best_d = 0
        for point in range(1, points):
            if best_d < project_d_from_0[point]:
                best_point = point
                best_d = project_d_from_0[point]
        if best_d < epsilon:
            break # We're done.

        project_d_from_best = project_distances(d, coords, best_point, epsilon)
        # Use Heron's formula to find areas of triangles from 0 to best_point, to any point.
        s = (best_d + project_d_from_0 + project_d_from_best) / 2
        area2 = s * (s - best_d) * (s - project_d_from_0) * (s - project_d_from_best)
        area2 = np.where(area2 < epsilon, 0, area2)
        area = np.sqrt(area2)
        # area = base * height / 2. The base is best_d. Find the height.
        height = 2 * area / best_d
        # height is at right angles to this dimension. Find this dimension.
        distance2 = np.square(project_d_from_0) - np.square(height)
        distance2 = np.where(distance2 < epsilon, 0, distance2)
        distance = np.sqrt(distance2)

        # Actual coordinates can be +- distance. Which better eplains project_d_from_best?
        predict_best_d_plus = np.sqrt(np.square(height) + np.square(best_d - distance))
        predict_best_d_minus = np.sqrt(np.square(height) + np.square(best_d + distance))
        column = np.where(np.abs(predict_best_d_plus - project_d_from_best) <= np.abs(predict_best_d_minus - project_d_from_best), distance, -distance)

        # Copy the column to the coordinates.
        for point in range(1, points):
            coords[point][dim] = column[point]

        # And reproject with current coords.
        project_d_from_0 = project_distances(d, coords, 0, epsilon)

    return coords

from scipy.stats import ortho_group
def randomly_position(coords, deviation):
    points, n = coords.shape
    center = np.sum(coords, axis=0).reshape(1, n) / points
    rotate = ortho_group(coords.shape[1]).rvs()
    return np.matmul(coords - center, rotate) + deviation * np.random.randn(1, n)

def random_points (count, dims):
    return np.random.randn(count, dims)

def points_to_distances (points):
    return np.array([
        [(np.sum(np.square(points[i] - points[j])))**0.5 for j in range(len(points))]
        for i in range(len(points))])

for _ in range(100):
    points = random_points(8, 3)
    distances = points_to_distances(points)
    coords = distances_to_normalized_coords(distances, 3)
    coords = randomly_position(coords, 1)
    for i in range(len(distances)):
        for j in range(len(distances[i])):
            coord_distance = (np.sum(np.square(coords[i] - coords[j])))**0.5
            if 0.00000001 < abs(coord_distance - distances[i][j]):
                print("points", points)
                print("distances", distances)
                print("coords", coords)
                print(i, j, distances[i][j], (np.sum(np.square(coords[i] - coords[j])))**0.5)
1
MvG On

With more spatial dimensions than points (i.e. N > P) this should be possible, if the distances are valid, which in particular means they have to satisfy the triangle inequality.

Let's take N = 3 for intuition. The first point you can pick anywhere. The distance between first and second defined a sphere around the first. The second has to lie somewhere on that sphere. The third has two distances to points you already placed. It has to lie on the intersection of the two corresponding spheres, which is a circle. A potential fourth point would lie on one of two points where three spheres intersect. For a fifth point you'd run out of dimensions, and rounding errors might make it difficult to satisfy all requirements simultaneously even if the distances originate from a real 3d configuration. That's why N > P is useful as you likely avoid this headache.

In terms of implementation, the above suggests that you would need to uniformly sample from hyperspheres of decreasing dimensions. You'd also have to intersect hyperspheres, and translate from the sample space to the actual positions. I don't know what tools numpy has to offer to help with any of this.

Personally I'd also explore a different approach: generate the whole configuration in a simple well-defined position and orientation, then apply a random isometry (rotation, translation, perhaps reflection) to it. You would place the first point in the origin. The second point goes on the positive x1-axis. The third on the x1-x2-plane with positive x2-coordinate, and so on. So the number of zeros in the coordinate vector decreases by one for each point, and the newest coordinate is always positive. This should in general give you uniquely determined coordinates, shifting the whole randomisation to an operation on the complete configuration.

I haven't yet read the literature but I guess randomised sampling of isometries should have been discussed somewhere. But perhaps just applying a sequence of random operations, like some rotations around specific axes, will already make the result random enough? Depends on your requirements.