I know the title is confusing but i don't know how to describe it better, let code explains itself:
I have a third-party library defines complex scalar as
typedef struct {
float real;
float imag;
} cpx;
so complex array/vector is like
cpx array[10];
for (int i = 0; i < 10; i++)
{
/* array[i].real and array[i].imag is real/imag part of i-th member */
}
current situation is, in a function I have two float array as arguments, I use two temporarily local complex array like:
void my_func(float *x, float *y) /* x is input, y is output, length is fixed, say 10 */
{
cpx tmp_cpx_A[10]; /* two local cpx array */
cpx tmp_cpx_B[10];
for (int i = 0; i < 10; i++) /* tmp_cpx_A is based on input x */
{
tmp_cpx_A[i].real = do_some_calculation(x[i]);
tmp_cpx_A[i].imag = do_some_other_calculation(x[i]);
}
some_library_function(tmp_cpx_A, tmp_cpx_B); /* tmp_cpx_B is based on tmp_cpx_A, out-of-place */
for (int i = 0; i < 10; i++) /* output y is based on tmp_cpx_B */
{
y[i] = do_final_calculation(tmp_cpx_B[i].real, tmp_cpx_B[i].imag);
}
}
I notice that after first loop x is useless, and second loop is in-place. If I can build tmp_cpx_B with same memory as x and y, I can save half of intermediate memory usage.
If the complex array is defined as
typedef struct{
float *real;
float *imag;
} cpx_alt;
then I can simply
cpx_alt tmp_cpx_B;
tmp_cpx_B.real = x;
tmp_cpx_B.imag = y;
and do the rest, but it is not.
I cannot change the definition of third library complex structure, and cannot take cpx as input because I want to hide internal library to outside user and not to break API.
So I wonder if it it possible to initialize struct array with scalar member like cpx with scalar array like x and y
Edit 1: for some common ask question:
- in practice the array length is up to 960, which means one
tmp_cpxarray will take 7680 bytes. And my platform have total 56k RAM, save onetmp_cpxwill save ~14% memory usage. - the 3rd party library is kissFFt and do FFT on complex array, it define its own
kiss_fft_cpxinstead of standard <complex.h> because it can use marco to switch bewteen floating/fixed point calculation
First of all, please note that C has a standardized library for complex numbers,
<complex.h>. You might want to use that one instead of some non-standard 3rd party lib.The main problem with your code might be execution speed, not memory usage. Allocating
2 * 10 * 2 = 40floats isn't a big deal on most systems. On the other hand, you touch the same memory area over and over again. This might be needlessly inefficient.Consider something like this instead:
Less instructions and less branching. And as a bonus, less stack usage.
In theory you might also gain a few CPU cycles by
restrictqualifying the parameters, though I didn't spot any improvement when I tried that on this code (gcc x86-64).