Filter xarray ZARR dataset with GeoDataFrame

164 Views Asked by Alvin Sartor At 21 September 2023 at 13:26

I am reading a ZARR file from a s3 bucket with xarray. I got to successfully filter by time and latitude/longitude:

    def read_zarr(self, dataset: str, region: Region) -> Any:
        # Read ZARR from s3 bucket
        fs = s3fs.S3FileSystem(key="KEY", secret="SECRET")
        mapper = fs.get_mapper(f"{self.S3_PATH}{dataset}")
        zarr_ds = xr.open_zarr(mapper, decode_times=True)

        # Filter by time
        time_period = pd.date_range("2013-01-01", "2023-01-31")
        zarr_ds = zarr_ds.sel(time=time_period)

        # Filter by latitude/longitude
        region_gdf = region.geo_data_frame
        latitude_slice = slice(region_gdf.bounds.miny[0], region_gdf.bounds.maxy[0])
        longitude_slice = slice(region_gdf.bounds.minx[0], region_gdf.bounds.maxx[0])
        return zarr_ds.sel(latitude=latitude_slice, longitude=longitude_slice)

The problem is that this returns a rectangle of data (actually a cuboid, if we consider the time dimension). For geographical regions that are long and thin, this will represent a huge waste, as I will first download years of data, to then discard most of it. Example with California:

I would like to intersect the ZARR coordinates with the region ones. How can I achieve it?

Original Q&A

There are 1 best solutions below

pasnik On 25 September 2023 at 18:33 BEST ANSWER

One option is to use rioxarray and then clip to your region, very similar to this thread.

You will need to set the coordinate system. Assuming your data are given as standard lat/lon, this is epsg 4326 (see opening sentences here). You may need to use region_gdf.to_crs('epsg:4326') to make sure that the coordinate systems are the same (or set_crs if it doesn't have a coordinate reference system already).

Once this formatting is done, you can use the rio.clip method and set drop=True to only keep the points in the region.

New lines below - it might not work exactly as written because I can't test it, but I think it should get you close.

import rioxarray 
zarr_ds.rio.set_spatial_dims(x_dim="lon", y_dim="lat", inplace=True)
zarr_ds.rio.write_crs("epsg:4326", inplace=True)
zarr_ds_clipped = zarr_ds.rio.clip(region_gdf, drop=True)

Filter xarray ZARR dataset with GeoDataFrame

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in GEOJSON

Related Questions in GEOPANDAS

Related Questions in PYTHON-XARRAY

Related Questions in ZARR

Trending Questions

Popular # Hahtags

Popular Questions