How to do a 'for' loop in tables for a PostgreSQL function in C

Question

How to do a 'for' loop in tables for a PostgreSQL function in C

260 Views Asked by Matheus Farias At 14 August 2023 at 19:18

I was thinking of developing a function for AGE that returns a graph's adjacency matrix. An adjacency matrix stores the number of edges between two nodes. The complexity to check if an edge exists within an adjacency matrix is O(1), while the space complexity is O(v^2) with v = number of vertices. AGE stores the vertices of each graph in graph_name._ag_label_vertex, while the edges are stored in graph_name._ag_label_edge.

The vertex table contains two columns: id and properties. The id column is of integer type, while the properties column is a JSON like format. The edge table contains the id of the edge, the id of the vertex which it comes from, the id of the vertex that it goes to, and the edge's properties.

Example:

SELECT * FROM "graph_name"."Vertex";
       id         | properties
------------------+-------------
 844424930131971  | {}
 1407374883553281 | {}


SELECT * FROM "graph_name"."Edges";
        id        |    start_id     |      end_id      | properties
------------------+-----------------+------------------+------------
 1125899906842625 | 844424930131971 | 1407374883553281 | {}
(1 row)

I was thinking on traversing the vertex column first, so that it is possible to store the vertices ids initially:

[0, 1, 2, 3]
[1, 0, 0, 0]
[2, 0, 0, 0]
[3, 0, 0, 0]

After that, loop through the edges table and store 1 if the edge between two nodes exists (if there are multiple edges between a pair of nodes, add 1 to the already set value for each edge). But I don't know exactly how to do this in C for a PostgreSQL function. Which function should I call to traverse the tables and get the values from the id column?

I thought of using SPI twice, first for the vertices and the second time for the edges, but I don't know if this is the best approach.

In the age--1.3.0.sql file, I declared the function as

CREATE FUNCTION ag_catalog.get_adjacency_matrix(graph_name name)
RETURNS integer[][]
LANGUAGE c
AS 'MODULE_PATHNAME';

And its definition in src/backend/commands/graph_commands.c:

PG_FUNCTION_INFO_V1(get_adjacency_matrix);
/*
* Function for returning the graph's adjacency matrix. This adjacency matrix
* stores the number of edges between two nodes. The complexity to check if an
* edge exists within an adjacency matrix is O(1), while the space complexity
* is O(v^2) with v = number of vertices.
*/
Datum get_adjacency_matrix(PG_FUNCTION_ARGS)
{
    PG_RETURN_VOID();
}

Original Q&A

There are 2 best solutions below

Mohamed Mokhtar - rrrokhtar On 15 August 2023 at 14:28

That's not an answer but it can be considered as a guide from another function:

By looking at get_all_edge_labels_per_graph At some point we can figure out fetching tuples inside it as the following using those functions:

ScanKeyInit: used for filtration
table_open: used for opening the table xd
heap_getnext: used for retrieving tubles

Function:


/*
 * Retrieves a list of all the names of a graph.
 *
 * XXX: We may want to use the cache system for this function,
 * however the cache system currently requires us to know the
 * name of the label we want.
 */
List *get_all_edge_labels_per_graph(EState *estate, Oid graph_oid)
{
    List *labels = NIL;
    ScanKeyData scan_keys[2];
    Relation ag_label;
    TableScanDesc scan_desc;
    HeapTuple tuple;
    TupleTableSlot *slot;
    ResultRelInfo *resultRelInfo;

    // setup scan keys to get all edges for the given graph oid
    ScanKeyInit(&scan_keys[1], Anum_ag_label_graph, BTEqualStrategyNumber,
                F_OIDEQ, ObjectIdGetDatum(graph_oid));
    ScanKeyInit(&scan_keys[0], Anum_ag_label_kind, BTEqualStrategyNumber,
                F_CHAREQ, CharGetDatum(LABEL_TYPE_EDGE));

    // setup the table to be scanned
    ag_label = table_open(ag_label_relation_id(), RowExclusiveLock);
    scan_desc = table_beginscan(ag_label, estate->es_snapshot, 2, scan_keys);

    resultRelInfo = create_entity_result_rel_info(estate, "ag_catalog",
                                                  "ag_label");

    slot = ExecInitExtraTupleSlot(
        estate, RelationGetDescr(resultRelInfo->ri_RelationDesc),
        &TTSOpsHeapTuple);

    // scan through the results and get all the label names.
    while(true)
    {
        Name label;
        bool isNull;
        Datum datum;

        tuple = heap_getnext(scan_desc, ForwardScanDirection);

        // no more labels to process
        if (!HeapTupleIsValid(tuple))
            break;

        ExecStoreHeapTuple(tuple, slot, false);

        datum = slot_getattr(slot, Anum_ag_label_name, &isNull);
        label = DatumGetName(datum);

        labels = lappend(labels, label);
    }

    table_endscan(scan_desc);

    destroy_entity_result_rel_info(resultRelInfo);
    table_close(resultRelInfo->ri_RelationDesc, RowExclusiveLock);

    return labels;
}

Source:

https://github.com/apache/age/tree/master/src/backend/catalog/ag_label.c#L274

**Matheus Farias** · Accepted Answer · 2023-08-18T23:30:48.860000

Apparently, the best way to traverse a table utilizing a C function for PostgreSQL is with the Server Programming Interface (SPI). Here's a code example:

SPI_connect();

char *query = psprintf("SELECT id FROM \"GraphName\".\"_ag_label_vertex\" ORDER BY id");

// Execute the query and check for errors.
int ret = SPI_execute(query, true, 0);
if (ret != SPI_OK_SELECT)
{
    elog(ERROR, "Error when fetching the vertices");
    SPI_finish();
    PG_RETURN_NULL();
}

// And then here is how to traverse the resulting table.
for (i = 0; i < SPI_processed; i++)
{
    char *result_string = SPI_getvalue(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 1);
    int vertex_id = DatumGetInt64(DirectFunctionCall1(int8in, CStringGetDatum(result_string)));
}

From PostgreSQL's Documentation:

When we call SPI_execute(const char * command, bool read_only, long count), it will execute the specified SQL command for count rows, and, if read_only is true, the command must be read only. If count is zero then the command is executed for all rows. This function returns a value depending on what query was executed. In this case, it returns SPI_OK_SELECT if the execution was successful.

The actual number of rows for which the (last) command was executed is returned in the global variable SPI_processed. If the return value of the function is indeed SPI_OK_SELECT, then we can use the global pointer SPITupleTable *SPI_tuptable to access the result rows.

The SPITupleTable struct is declared as:

typedef struct SPITupleTable
{
    /* Public members */
    TupleDesc   tupdesc;        /* tuple descriptor */
    HeapTuple  *vals;           /* array of tuples */
    uint64      numvals;        /* number of valid tuples */

    /* Private members, not intended for external callers */
    uint64      alloced;        /* allocated length of vals array */
    MemoryContext tuptabcxt;    /* memory context of result table */
    slist_node  next;           /* link for internal bookkeeping */
    SubTransactionId subid;     /* subxact in which tuptable was created */
} SPITupleTable;

We can access the corresponding row from the SPITupleTable by doing SPI_tuptable->vals[i] where i is the number of the row. To get the value of a certain column from the row, in this case, we need to call SPI_getvalue(HeapTuple row, TupleDesc rowdesc, int colnumber). This function will return a char* to the corresponding data.

If we want to convert it to int64, we call DatumGetInt64(DirectFunctionCall1(int8in, CStringGetDatum(result_string))).

How to do a 'for' loop in tables for a PostgreSQL function in C

There are 2 best solutions below

Related Questions in C

Related Questions in POSTGRESQL

Related Questions in GRAPH-THEORY

Related Questions in APACHE-AGE

Trending Questions

Popular # Hahtags

Popular Questions