repurpose package

Submodules

repurpose.img2ts module

class repurpose.img2ts.Img2Ts(input_dataset, outputpath, startdate, enddate, input_kwargs={}, input_grid=None, target_grid=None, imgbuffer=100, variable_rename=None, unlim_chunksize=100, cellsize_lat=180.0, cellsize_lon=360.0, r_methods='nn', r_weightf=None, r_min_n=1, r_radius=18000, r_neigh=8, r_fill_values=None, filename_templ='%04d.nc', gridname='grid.nc', global_attr=None, ts_attributes=None, ts_dtypes=None, time_units='days since 1858-11-17 00:00:00', zlib=True)[source]

Bases: object

class that uses the read_img iterator of the input_data dataset to read all images between startdate and enddate and saves them in netCDF time series files according to the given netCDF class and the cell structure of the outputgrid

Parameters:
  • input_dataset (DatasetImgBase like class instance) – must implement a read(date, **input_kwargs) iterator that returns a pygeobase.object_base.Image.

  • outputpath (string) – path where to save the time series to

  • startdate (date) – date from which the time series should start. Of course images have to be available from this date onwards.

  • enddate (date) – date when the time series should end. Images should be availabe up until this date

  • input_kwargs (dict, optional) – keyword arguments which should be used in the read_img method of the input_dataset

  • input_grid (grid instance as defined in :module:`pytesmo.grids.grid`, optional) – the grid on which input data is stored. If not given then the grid of the input dataset will be used. If the input dataset has no grid object then resampling to the target_grid is performed.

  • target_grid (grid instance as defined in :module:`pytesmo.grids.grid`, optional) – the grid on which the time series will be stored. If not given then the grid of the input dataset will be used

  • imgbuffer (int, optional) – number of days worth of images that should be read into memory before a time series is written. This parameter should be chosen so that the memory of your machine is utilized. It depends on the daily data volume of the input dataset

  • variable_rename (dict, optional) – if the variables should have other names than the names that are returned as keys in the dict by the daily_images iterator. A dictionary can be provided that changes these names for the time series.

  • unlim_chunksize (int, optional) – netCDF chunksize for unlimited variables.

  • cellsize_lat (float, optional) – if outgrid or input_data.grid are not cell grids then the cellsize in latitude direction can be specified here. Default is 1 global cell.

  • cellsize_lon (float, optional) – if outgrid or input_data.grid are not cell grids then the cellsize in longitude direction can be specified here. Default is 1 global cell.

  • r_methods (string or dict, optional) – resample methods to use if resampling is necessary, either ‘nn’ for nearest neighbour or ‘custom’ for custom weight function. Can also be a dictionary in which the method is specified for each variable

  • r_weightf (function or dict, optional) – if r_methods is custom this function will be used to calculate the weights depending on distance. This can also be a dict with a separate weight function for each variable.

  • r_min_n (int, optional) – Minimum number of neighbours on the target_grid that are required for a point to be resampled.

  • r_radius (float, optional) – resample radius in which neighbours should be searched given in meters

  • r_neigh (int, optional) – maximum number of neighbours found inside r_radius to use during resampling. If more are found the r_neigh closest neighbours will be used.

  • r_fill_values (number or dict, optional) – if given the resampled output array will be filled with this value if no valid resampled value could be computed, if not a masked array will be returned can also be a dict with a fill value for each variable

  • filename_templ (string, optional) – filename template must be a string with a string formatter for the cell number. e.g. ‘%04d.nc’ will translate to the filename ‘0001.nc’ for cell number 1.

  • gridname (string, optional) – filename of the grid which will be saved as netCDF

  • global_attr (dict, optional) – global attributes for each file

  • ts_attributes (dict, optional) – dictionary of attributes that should be set for the netCDF time series. Can be either a dictionary of attributes that will be set for all variables in input_data or a dictionary of dictionaries. In the second case the first dictionary has to have a key for each variable returned by input_data and the second level dictionary will be the dictionary of attributes for this time series.

  • ts_dtype (numpy.dtype or dict of numpy.dtypes) – data type to use for the time series, if it is a dict then a key must exist for each variable returned by input_data. Default : None, no change from input data

  • time_units (string, optional) – units the time axis is given in. Default: “days since 1858-11-17 00:00:00” which is modified julian date for regular images this can be set freely since the conversion is done automatically, for images with irregular timestamp this will be ignored for now

  • zlib (boolean, optional) – if True the saved netCDF files will be compressed Default: True

calc()[source]

go through all images and retrieve a stack of them then go through all grid points in cell order and write to netCDF file

img_bulk()[source]

Yields numpy array of self.const.imgbuffer images, start and enddate until all dates have been read

Returns:

  • img_stack_dict (dict of numpy.array) – stack of daily images for each variable

  • startdate (date) – date of first image in stack

  • enddate (date) – date of last image in stack

  • datetimestack (np.array) – array of the timestamps of each image

  • jd_stack (np.array or None) – if None all observations in an image have the same observation timestamp. Otherwise it gives the julian date of each observation in img_stack_dict

exception repurpose.img2ts.Img2TsError[source]

Bases: Exception

repurpose.process module

repurpose.process.idx_chunks(idx, n=-1)[source]

Yield successive n-sized chunks from list.

Parameters:
  • idx (pd.DateTimeIndex) – Time series index to split into parts

  • n (int, optional (default: -1)) – Parts to split idx up into, -1 returns the full index.

repurpose.process.parallel_process_async(FUNC, ITER_KWARGS, STATIC_KWARGS=None, n_proc=1, show_progress_bars=True, ignore_errors=False, log_path=None, loglevel='WARNING', verbose=False)[source]

Applies the passed function to all elements of the passed iterables. Parallel function calls are processed ASYNCHRONOUSLY (ie order of return values might be different from order of passed iterables)! Usually the iterable is a list of cells, but it can also be a list of e.g. images etc.

Parameters:
  • FUNC (Callable) – Function to call.

  • ITER_KWARGS (dict) – Container that holds iterables to split up and call in parallel with FUNC: Usually something like ‘cell’: [cells, … ] If multiple, iterables MUST HAVE THE SAME LENGTH. We iterate through all iterables and pass them to FUNC as individual kwargs. i.e. FUNC is called N times, where N is the length of iterables passed in this dict. Can not be empty!

  • STATIC_KWARGS (dict, optional (default: None)) – Kwargs that are passed to FUNC in addition to each element in ITER_KWARGS. Are the same for each call of FUNC!

  • n_proc (int, optional (default: 1)) – Number of parallel workers. If 1 is chosen, we do not use a pool. In this case the return values are kept in order.

  • show_progress_bars (bool, optional (default: True)) – Show how many iterables were processed already.

  • log_path (str, optional (default: None)) – If provided, a log file is created in the passed directory.

  • loglevel (str, optional (default: "WARNING")) – Log level to use for logging. Must be one of [“DEBUG”, “INFO”, “WARNING”, “ERROR”, “CRITICAL”].

  • verbose (float, optional (default: False)) – Print all logging messages to stdout, useful for debugging.

Returns:

results – List of return values from each function call

Return type:

list

repurpose.process.rootdir() Path[source]

repurpose.resample module

repurpose.resample.hamming_window(radius, distances)[source]

Hamming window filter.

Parameters:
  • radius (float32) – Radius of the window.

  • distances (numpy.ndarray) – Array with distances.

Returns:

weights – Distance weights.

Return type:

numpy.ndarray

repurpose.resample.resample_to_grid(input_data, src_lon, src_lat, target_lon, target_lat, methods='nn', weight_funcs=None, min_neighbours=1, search_rad=18000, neighbours=8, fill_values=None)[source]

resamples data from dictionary of numpy arrays using pyresample to given grid. Searches for the neighbours and then resamples the data to the grid given in togrid if at least min_neighbours neighbours are found

Parameters:
  • input_data (dict of numpy.arrays) –

  • src_lon (numpy.array) – longitudes of the input data

  • src_lat (numpy.array) – src_latitudes of the input data

  • target_lon (numpy.array) – longitudes of the output data

  • target_src_lat (numpy.array) – src_latitudes of the output data

  • methods (string or dict, optional) – method of spatial averaging. this is given to pyresample and can be ‘nn’ : nearest neighbour ‘custom’ : custom weight function has to be supplied in weight_funcs see pyresample documentation for more details can also be a dictionary with a method for each array in input data dict

  • weight_funcs (function or dict of functions, optional) – if method is ‘custom’ a function like func(distance) has to be given can also be a dictionary with a function for each array in input data dict

  • min_neighbours (int, optional) – if given then only points with at least this number of neighbours will be resampled Default : 1

  • search_rad (float, optional) – search radius in meters of neighbour search Default : 18000

  • neighbours (int, optional) – maximum number of neighbours to look for for each input grid point Default : 8

  • fill_values (number or dict, optional) – if given the output array will be filled with this value if no valid resampled value could be computed, if not a masked array will be returned can also be a dict with a fill value for each variable

Returns:

data – resampled data on given grid

Return type:

dict of numpy.arrays

Raises:

ValueError : – if empty dataset is resampled

repurpose.resample.resample_to_grid_only_valid_return(input_data, src_lon, src_lat, target_lon, target_lat, methods='nn', weight_funcs=None, min_neighbours=1, search_rad=18000, neighbours=8, fill_values=None)[source]

resamples data from dictionary of numpy arrays using pyresample to given grid. Searches for the neighbours and then resamples the data to the grid given in to grid if at least min_neighbours neighbours are found

Parameters:
  • input_data (dict of numpy.arrays) – Data to resample

  • src_lon (numpy.array) – longitudes of the input data

  • src_lat (numpy.array) – src_latitudes of the input data

  • target_lon (numpy.array) – longitudes of the output data

  • target_src_lat (numpy.array) – src_latitudes of the output data

  • methods (string or dict, optional) – method of spatial averaging. this is given to pyresample and can be ‘nn’ : nearest neighbour ‘custom’ : custom weight function has to be supplied in weight_funcs see pyresample documentation for more details can also be a dictionary with a method for each array in input data dict

  • weight_funcs (function or dict of functions, optional) – if method is ‘custom’ a function like func(distance) has to be given can also be a dictionary with a function for each array in input data dict

  • min_neighbours (int, optional) – if given then only points with at least this number of neighbours will be resampled Default : 1

  • search_rad (float, optional) – search radius in meters of neighbour search Default : 18000

  • neighbours (int, optional) – maximum number of neighbours to look for for each input grid point Default : 8

  • fill_values (number or dict, optional) – if given the output array will be filled with this value if no valid resampled value could be computed, if not a masked array will be returned can also be a dict with a fill value for each variable

Returns:

  • data (dict of numpy.arrays) – resampled data on part of the target grid over which data was found

  • mask (numpy.ndarray) – boolean mask into target grid that specifies where data was resampled

Raises:

ValueError : – if empty dataset is resampled

repurpose.stack module

repurpose.ts2img module

Module contents