Effective way to add "time" dimension to two dimensional (x, y) NetCDF file, which has variables reprecenting same variable at different times

110 Views Asked by Jugilismaani At 02 August 2023 at 07:37

I had CSV file with x and y coordinates, as well as variable values for three different time steps as follows:

x, y, var_t1, var_t2, var_t3,
1, 1, 8,      8,      6
1, 2, 6,      1,      2
2, 1, 5,      3,      7
2, 2, 7,      2,      6

I have learned to create a NetCDF file with the following method:

import xarray as xr
xr.Dataset.from_dataframe(df.set_index(['x', 'y'])).to_netcdf('filename.nc')

This results in a NetCDF with x and y as dimensions, and I get 3 different variables.

My goal was to create a NetCDF with x, y and t as dimensions with a single variable.

I managed to achieve this but I feel like I did it in a very complicated fashion.

My solution was to play with the CSV file and make it 3 times longer, while adding a "t" column to represent time steps:

x, y, t, var_t1, var_t2, var_t3,
1, 1, 0, 8,      0,      0
1, 2, 0, 6,      0,      0
2, 1, 0, 5,      0,      0
2, 2, 0, 7,      0,      0
1, 1, 1, 0,      8,      0
1, 2, 1, 0,      1,      0
2, 1, 1, 0,      3,      0
2, 2, 1, 0,      2,      0
1, 1, 2, 0,      0,      6
1, 2, 2, 0,      0,      2
2, 1, 2, 0,      0,      7
2, 2, 2, 0,      0,      6

Now when I apply

import xarray as xr
xr.Dataset.from_dataframe(df.set_index(['x', 'y', 't'])).to_netcdf('filename.nc')

I get a NetCDF with x, y, t dimensions and a single variable for each different time (i.e. when t = 1, only var_t2 != 0).

Would there be a way to achieve this in a much simpler way, in case I encounter a similar problem in the future? This was easy to do with only 3 time steps, but I would be in trouble with tens or thousands of time steps.

Thank you!

Original Q&A

There are 1 best solutions below

jspaeth On 02 August 2023 at 08:18 BEST ANSWER

Say you have the dataframe df:

>> df
x  y  var_t1  var_t2  var_t3
0  1  1       8       8       6
1  1  2       6       1       2
2  2  1       5       3       7
3  2  2       7       2       6

You can set x,y as an index, convert it to xarray, merge the variables var_t1... to a new dimension and set new_times as the coordinates of the time dimension:

>> ds = df.set_index(["x", "y"]).to_xarray()
>> ds
<xarray.Dataset>
Dimensions:  (x: 2, y: 2)
Coordinates:
  * x        (x) int64 1 2
  * y        (y) int64 1 2
Data variables:
    var_t1   (x, y) int64 8 6 5 7
    var_t2   (x, y) int64 8 1 3 2
    var_t3   (x, y) int64 6 2 7 6


>> new_times = range(3)
>> ds_result = ds.to_array(dim="time").assign_coords(time=new_times)
>> ds_result
<xarray.DataArray (time: 3, x: 2, y: 2)>
array([[[8, 6],
        [5, 7]],

       [[8, 1],
        [3, 2]],

       [[6, 2],
        [7, 6]]])
Coordinates:
  * x        (x) int64 1 2
  * y        (y) int64 1 2
  * time     (time) int64 0 1 2

Effective way to add "time" dimension to two dimensional (x, y) NetCDF file, which has variables reprecenting same variable at different times

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in CSV

Related Questions in NETCDF

Related Questions in PYTHON-XARRAY

Related Questions in DIMENSIONS

Trending Questions

Popular # Hahtags

Popular Questions