repurpose package

Submodules

repurpose.img2ts module

class repurpose.img2ts.Img2Ts(input_dataset, outputpath, startdate, enddate, input_kwargs={}, input_grid=None, target_grid=None, imgbuffer=100, variable_rename=None, unlim_chunksize=100, cellsize_lat=180.0, cellsize_lon=360.0, r_methods='nn', r_weightf=None, r_min_n=1, r_radius=18000, r_neigh=8, r_fill_values=None, filename_templ='%04d.nc', gridname='grid.nc', global_attr=None, ts_attributes=None, ts_dtypes=None, time_units='days since 1858-11-17 00:00:00', zlib=True)[source]

Bases: object

class that uses the read_img iterator of the input_data dataset to read all images between startdate and enddate and saves them in netCDF time series files according to the given netCDF class and the cell structure of the outputgrid

Parameters:
  • input_dataset (DatasetImgBase like class instance) – must implement a read(date, **input_kwargs) iterator that returns a pygeobase.object_base.Image.

  • outputpath (string) – path where to save the time series to

  • startdate (date) – date from which the time series should start. Of course images have to be available from this date onwards.

  • enddate (date) – date when the time series should end. Images should be availabe up until this date

  • input_kwargs (dict, optional) – keyword arguments which should be used in the read_img method of the input_dataset

  • input_grid (grid instance as defined in :module:`pytesmo.grids.grid`, optional) – the grid on which input data is stored. If not given then the grid of the input dataset will be used. If the input dataset has no grid object then resampling to the target_grid is performed.

  • target_grid (grid instance as defined in :module:`pytesmo.grids.grid`, optional) – the grid on which the time series will be stored. If not given then the grid of the input dataset will be used

  • imgbuffer (int, optional) – number of days worth of images that should be read into memory before a time series is written. This parameter should be chosen so that the memory of your machine is utilized. It depends on the daily data volume of the input dataset

  • variable_rename (dict, optional) – if the variables should have other names than the names that are returned as keys in the dict by the daily_images iterator. A dictionary can be provided that changes these names for the time series.

  • unlim_chunksize (int, optional) – netCDF chunksize for unlimited variables.

  • cellsize_lat (float, optional) – if outgrid or input_data.grid are not cell grids then the cellsize in latitude direction can be specified here. Default is 1 global cell.

  • cellsize_lon (float, optional) – if outgrid or input_data.grid are not cell grids then the cellsize in longitude direction can be specified here. Default is 1 global cell.

  • r_methods (string or dict, optional) – resample methods to use if resampling is necessary, either ‘nn’ for nearest neighbour or ‘custom’ for custom weight function. Can also be a dictionary in which the method is specified for each variable

  • r_weightf (function or dict, optional) – if r_methods is custom this function will be used to calculate the weights depending on distance. This can also be a dict with a separate weight function for each variable.

  • r_min_n (int, optional) – Minimum number of neighbours on the target_grid that are required for a point to be resampled.

  • r_radius (float, optional) – resample radius in which neighbours should be searched given in meters

  • r_neigh (int, optional) – maximum number of neighbours found inside r_radius to use during resampling. If more are found the r_neigh closest neighbours will be used.

  • r_fill_values (number or dict, optional) – if given the resampled output array will be filled with this value if no valid resampled value could be computed, if not a masked array will be returned can also be a dict with a fill value for each variable

  • filename_templ (string, optional) – filename template must be a string with a string formatter for the cell number. e.g. ‘%04d.nc’ will translate to the filename ‘0001.nc’ for cell number 1.

  • gridname (string, optional) – filename of the grid which will be saved as netCDF

  • global_attr (dict, optional) – global attributes for each file

  • ts_attributes (dict, optional) – dictionary of attributes that should be set for the netCDF time series. Can be either a dictionary of attributes that will be set for all variables in input_data or a dictionary of dictionaries. In the second case the first dictionary has to have a key for each variable returned by input_data and the second level dictionary will be the dictionary of attributes for this time series.

  • ts_dtype (numpy.dtype or dict of numpy.dtypes) – data type to use for the time series, if it is a dict then a key must exist for each variable returned by input_data. Default : None, no change from input data

  • time_units (string, optional) – units the time axis is given in. Default: “days since 1858-11-17 00:00:00” which is modified julian date for regular images this can be set freely since the conversion is done automatically, for images with irregular timestamp this will be ignored for now

  • zlib (boolean, optional) – if True the saved netCDF files will be compressed Default: True

calc()[source]

go through all images and retrieve a stack of them then go through all grid points in cell order and write to netCDF file

img_bulk()[source]

Yields numpy array of self.const.imgbuffer images, start and enddate until all dates have been read

Returns:

  • img_stack_dict (dict of numpy.array) – stack of daily images for each variable

  • startdate (date) – date of first image in stack

  • enddate (date) – date of last image in stack

  • datetimestack (np.array) – array of the timestamps of each image

  • jd_stack (np.array or None) – if None all observations in an image have the same observation timestamp. Otherwise it gives the julian date of each observation in img_stack_dict

exception repurpose.img2ts.Img2TsError[source]

Bases: Exception

repurpose.resample module

repurpose.resample.hamming_window(radius, distances)[source]

Hamming window filter.

Parameters:
  • radius (float32) – Radius of the window.

  • distances (numpy.ndarray) – Array with distances.

Returns:

weights – Distance weights.

Return type:

numpy.ndarray

repurpose.resample.resample_to_grid(input_data, src_lon, src_lat, target_lon, target_lat, methods='nn', weight_funcs=None, min_neighbours=1, search_rad=18000, neighbours=8, fill_values=None)[source]

resamples data from dictionary of numpy arrays using pyresample to given grid. Searches for the neighbours and then resamples the data to the grid given in togrid if at least min_neighbours neighbours are found

Parameters:
  • input_data (dict of numpy.arrays) –

  • src_lon (numpy.array) – longitudes of the input data

  • src_lat (numpy.array) – src_latitudes of the input data

  • target_lon (numpy.array) – longitudes of the output data

  • target_src_lat (numpy.array) – src_latitudes of the output data

  • methods (string or dict, optional) – method of spatial averaging. this is given to pyresample and can be ‘nn’ : nearest neighbour ‘custom’ : custom weight function has to be supplied in weight_funcs see pyresample documentation for more details can also be a dictionary with a method for each array in input data dict

  • weight_funcs (function or dict of functions, optional) – if method is ‘custom’ a function like func(distance) has to be given can also be a dictionary with a function for each array in input data dict

  • min_neighbours (int, optional) – if given then only points with at least this number of neighbours will be resampled Default : 1

  • search_rad (float, optional) – search radius in meters of neighbour search Default : 18000

  • neighbours (int, optional) – maximum number of neighbours to look for for each input grid point Default : 8

  • fill_values (number or dict, optional) – if given the output array will be filled with this value if no valid resampled value could be computed, if not a masked array will be returned can also be a dict with a fill value for each variable

Returns:

data – resampled data on given grid

Return type:

dict of numpy.arrays

Raises:

ValueError : – if empty dataset is resampled

repurpose.resample.resample_to_grid_only_valid_return(input_data, src_lon, src_lat, target_lon, target_lat, methods='nn', weight_funcs=None, min_neighbours=1, search_rad=18000, neighbours=8, fill_values=None)[source]

resamples data from dictionary of numpy arrays using pyresample to given grid. Searches for the neighbours and then resamples the data to the grid given in togrid if at least min_neighbours neighbours are found

Parameters:
  • input_data (dict of numpy.arrays) –

  • src_lon (numpy.array) – longitudes of the input data

  • src_lat (numpy.array) – src_latitudes of the input data

  • target_lon (numpy.array) – longitudes of the output data

  • target_src_lat (numpy.array) – src_latitudes of the output data

  • methods (string or dict, optional) – method of spatial averaging. this is given to pyresample and can be ‘nn’ : nearest neighbour ‘custom’ : custom weight function has to be supplied in weight_funcs see pyresample documentation for more details can also be a dictionary with a method for each array in input data dict

  • weight_funcs (function or dict of functions, optional) – if method is ‘custom’ a function like func(distance) has to be given can also be a dictionary with a function for each array in input data dict

  • min_neighbours (int, optional) – if given then only points with at least this number of neighbours will be resampled Default : 1

  • search_rad (float, optional) – search radius in meters of neighbour search Default : 18000

  • neighbours (int, optional) – maximum number of neighbours to look for for each input grid point Default : 8

  • fill_values (number or dict, optional) – if given the output array will be filled with this value if no valid resampled value could be computed, if not a masked array will be returned can also be a dict with a fill value for each variable

Returns:

  • data (dict of numpy.arrays) – resampled data on part of the target grid over which data was found

  • mask (numpy.ndarray) – boolean mask into target grid that specifies where data was resampled

Raises:

ValueError : – if empty dataset is resampled

repurpose.ts2img module

class repurpose.ts2img.Ts2Img(tsreader, imgwriter, agg_func=None, ts_buffer=1000)[source]

Bases: object

Takes a time series dataset and converts it into an image dataset. A custom aggregate function should be given otherwise a daily mean will be used

Parameters:
  • tsreader (object) – object that implements a iter_ts method which iterates over pandas time series and has a grid attribute that is a pytesmo BasicGrid or CellGrid

  • imgwriter (object) – writer object that implements a write_ts method that takes a list of grid point indices and a 2D array containing the time series data

  • agg_func (function) – function that takes a pandas DataFrame and returns an aggregated pandas DataFrame

  • ts_buffer (int) – how many time series to read before writing to disk, constrained by the working memory the process should use.

calc(**tsaggkw)[source]

does the conversion from time series to images

tsbulk(gpis=None, **tsaggkw)[source]

iterator over gpi and time series arrays of size self.ts_buffer

Parameters:
  • gpis (iterable, optional) – if given these gpis will be used, can be practical if the gpis are managed by an external class e.g. for parallel processing

  • tsaggkw (dict) – Keywords to give to the time series aggregation function

Returns:

  • gpi_array (numpy.array) – numpy array of gpis in this batch

  • ts_bulk (dict of numpy arrays) – for each variable one numpy array of shape (len(gpi_array), len(ts_aggregated))

repurpose.ts2img.agg_tsmonthly(ts, **kwargs)[source]
Parameters:
  • ts (pandas.DataFrame) – time series of a point

  • kwargs (dict) – any additional keyword arguments that are given to the ts2img object during initialization

Returns:

ts_agg – aggregated time series, they all must have the same length otherwise it can not work each column of this DataFrame will be a layer in the image

Return type:

pandas.DataFrame

Module contents