repurpose package¶

Submodules¶

repurpose.img2ts module¶

class repurpose.img2ts.Img2Ts(input_dataset, outputpath, startdate, enddate, input_kwargs={}, input_grid=None, target_grid=None, imgbuffer=100, variable_rename=None, unlim_chunksize=100, cellsize_lat=180.0, cellsize_lon=360.0, r_methods='nn', r_weightf=None, r_min_n=1, r_radius=18000, r_neigh=8, r_fill_values=None, filename_templ='%04d.nc', gridname='grid.nc', global_attr=None, ts_attributes=None, ts_dtypes=None, time_units='days since 1858-11-17 00:00:00', zlib=True)[source]¶

Bases: object

class that uses the read_img iterator of the input_data dataset to read all images between startdate and enddate and saves them in netCDF time series files according to the given netCDF class and the cell structure of the outputgrid

Parameters:

input_dataset (DatasetImgBase like class instance) – must implement a read(date, **input_kwargs) iterator that returns a pygeobase.object_base.Image.
outputpath (string) – path where to save the time series to
startdate (date) – date from which the time series should start. Of course images have to be available from this date onwards.
enddate (date) – date when the time series should end. Images should be availabe up until this date
input_kwargs (dict, optional) – keyword arguments which should be used in the read_img method of the input_dataset
input_grid (grid instance as defined in :module:`pytesmo.grids.grid`, optional) – the grid on which input data is stored. If not given then the grid of the input dataset will be used. If the input dataset has no grid object then resampling to the target_grid is performed.
target_grid (grid instance as defined in :module:`pytesmo.grids.grid`, optional) – the grid on which the time series will be stored. If not given then the grid of the input dataset will be used
imgbuffer (int, optional) – number of days worth of images that should be read into memory before a time series is written. This parameter should be chosen so that the memory of your machine is utilized. It depends on the daily data volume of the input dataset
variable_rename (dict, optional) – if the variables should have other names than the names that are returned as keys in the dict by the daily_images iterator. A dictionary can be provided that changes these names for the time series.
unlim_chunksize (int, optional) – netCDF chunksize for unlimited variables.
cellsize_lat (float, optional) – if outgrid or input_data.grid are not cell grids then the cellsize in latitude direction can be specified here. Default is 1 global cell.
cellsize_lon (float, optional) – if outgrid or input_data.grid are not cell grids then the cellsize in longitude direction can be specified here. Default is 1 global cell.
r_methods (string or dict, optional) – resample methods to use if resampling is necessary, either ‘nn’ for nearest neighbour or ‘custom’ for custom weight function. Can also be a dictionary in which the method is specified for each variable
r_weightf (function or dict, optional) – if r_methods is custom this function will be used to calculate the weights depending on distance. This can also be a dict with a separate weight function for each variable.
r_min_n (int, optional) – Minimum number of neighbours on the target_grid that are required for a point to be resampled.
r_radius (float, optional) – resample radius in which neighbours should be searched given in meters
r_neigh (int, optional) – maximum number of neighbours found inside r_radius to use during resampling. If more are found the r_neigh closest neighbours will be used.
r_fill_values (number or dict, optional) – if given the resampled output array will be filled with this value if no valid resampled value could be computed, if not a masked array will be returned can also be a dict with a fill value for each variable
filename_templ (string, optional) – filename template must be a string with a string formatter for the cell number. e.g. ‘%04d.nc’ will translate to the filename ‘0001.nc’ for cell number 1.
gridname (string, optional) – filename of the grid which will be saved as netCDF
global_attr (dict, optional) – global attributes for each file
ts_attributes (dict, optional) – dictionary of attributes that should be set for the netCDF time series. Can be either a dictionary of attributes that will be set for all variables in input_data or a dictionary of dictionaries. In the second case the first dictionary has to have a key for each variable returned by input_data and the second level dictionary will be the dictionary of attributes for this time series.
ts_dtype (numpy.dtype or dict of numpy.dtypes) – data type to use for the time series, if it is a dict then a key must exist for each variable returned by input_data. Default : None, no change from input data
time_units (string, optional) – units the time axis is given in. Default: “days since 1858-11-17 00:00:00” which is modified julian date for regular images this can be set freely since the conversion is done automatically, for images with irregular timestamp this will be ignored for now
zlib (boolean, optional) – if True the saved netCDF files will be compressed Default: True

calc()[source]¶: go through all images and retrieve a stack of them then go through all grid points in cell order and write to netCDF file

img_bulk()[source]¶

Yields numpy array of self.const.imgbuffer images, start and enddate until all dates have been read

Returns:

img_stack_dict (dict of numpy.array) – stack of daily images for each variable
startdate (date) – date of first image in stack
enddate (date) – date of last image in stack
datetimestack (np.array) – array of the timestamps of each image
jd_stack (np.array or None) – if None all observations in an image have the same observation timestamp. Otherwise it gives the julian date of each observation in img_stack_dict

exception repurpose.img2ts.Img2TsError[source]¶: Bases: Exception

repurpose.resample module¶

repurpose.resample.hamming_window(radius, distances)[source]¶

Hamming window filter.

Parameters:

radius (float32) – Radius of the window.
distances (numpy.ndarray) – Array with distances.

Returns:

weights – Distance weights.

Return type:

numpy.ndarray

repurpose.resample.resample_to_grid(input_data, src_lon, src_lat, target_lon, target_lat, methods='nn', weight_funcs=None, min_neighbours=1, search_rad=18000, neighbours=8, fill_values=None)[source]¶

resamples data from dictionary of numpy arrays using pyresample to given grid. Searches for the neighbours and then resamples the data to the grid given in togrid if at least min_neighbours neighbours are found

Parameters:

input_data (dict of numpy.arrays) –
src_lon (numpy.array) – longitudes of the input data
src_lat (numpy.array) – src_latitudes of the input data
target_lon (numpy.array) – longitudes of the output data
target_src_lat (numpy.array) – src_latitudes of the output data
methods (string or dict, optional) – method of spatial averaging. this is given to pyresample and can be ‘nn’ : nearest neighbour ‘custom’ : custom weight function has to be supplied in weight_funcs see pyresample documentation for more details can also be a dictionary with a method for each array in input data dict
weight_funcs (function or dict of functions, optional) – if method is ‘custom’ a function like func(distance) has to be given can also be a dictionary with a function for each array in input data dict
min_neighbours (int, optional) – if given then only points with at least this number of neighbours will be resampled Default : 1
search_rad (float, optional) – search radius in meters of neighbour search Default : 18000
neighbours (int, optional) – maximum number of neighbours to look for for each input grid point Default : 8
fill_values (number or dict, optional) – if given the output array will be filled with this value if no valid resampled value could be computed, if not a masked array will be returned can also be a dict with a fill value for each variable

Returns:

data – resampled data on given grid

Return type:

dict of numpy.arrays

Raises:

ValueError : – if empty dataset is resampled

repurpose.resample.resample_to_grid_only_valid_return(input_data, src_lon, src_lat, target_lon, target_lat, methods='nn', weight_funcs=None, min_neighbours=1, search_rad=18000, neighbours=8, fill_values=None)[source]¶

resamples data from dictionary of numpy arrays using pyresample to given grid. Searches for the neighbours and then resamples the data to the grid given in togrid if at least min_neighbours neighbours are found

Parameters:

input_data (dict of numpy.arrays) –
src_lon (numpy.array) – longitudes of the input data
src_lat (numpy.array) – src_latitudes of the input data
target_lon (numpy.array) – longitudes of the output data
target_src_lat (numpy.array) – src_latitudes of the output data
methods (string or dict, optional) – method of spatial averaging. this is given to pyresample and can be ‘nn’ : nearest neighbour ‘custom’ : custom weight function has to be supplied in weight_funcs see pyresample documentation for more details can also be a dictionary with a method for each array in input data dict
weight_funcs (function or dict of functions, optional) – if method is ‘custom’ a function like func(distance) has to be given can also be a dictionary with a function for each array in input data dict
min_neighbours (int, optional) – if given then only points with at least this number of neighbours will be resampled Default : 1
search_rad (float, optional) – search radius in meters of neighbour search Default : 18000
neighbours (int, optional) – maximum number of neighbours to look for for each input grid point Default : 8
fill_values (number or dict, optional) – if given the output array will be filled with this value if no valid resampled value could be computed, if not a masked array will be returned can also be a dict with a fill value for each variable

Returns:

data (dict of numpy.arrays) – resampled data on part of the target grid over which data was found
mask (numpy.ndarray) – boolean mask into target grid that specifies where data was resampled

Raises:

ValueError : – if empty dataset is resampled

repurpose.ts2img module¶

class repurpose.ts2img.Ts2Img(tsreader, imgwriter, agg_func=None, ts_buffer=1000)[source]¶

Bases: object

Takes a time series dataset and converts it into an image dataset. A custom aggregate function should be given otherwise a daily mean will be used

Parameters:

tsreader (object) – object that implements a iter_ts method which iterates over pandas time series and has a grid attribute that is a pytesmo BasicGrid or CellGrid
imgwriter (object) – writer object that implements a write_ts method that takes a list of grid point indices and a 2D array containing the time series data
agg_func (function) – function that takes a pandas DataFrame and returns an aggregated pandas DataFrame
ts_buffer (int) – how many time series to read before writing to disk, constrained by the working memory the process should use.

calc(**tsaggkw)[source]¶: does the conversion from time series to images

tsbulk(gpis=None, **tsaggkw)[source]¶

iterator over gpi and time series arrays of size self.ts_buffer

Parameters:

gpis (iterable, optional) – if given these gpis will be used, can be practical if the gpis are managed by an external class e.g. for parallel processing
tsaggkw (dict) – Keywords to give to the time series aggregation function

Returns:

gpi_array (numpy.array) – numpy array of gpis in this batch
ts_bulk (dict of numpy arrays) – for each variable one numpy array of shape (len(gpi_array), len(ts_aggregated))

repurpose.ts2img.agg_tsmonthly(ts, **kwargs)[source]¶

Parameters:

ts (pandas.DataFrame) – time series of a point
kwargs (dict) – any additional keyword arguments that are given to the ts2img object during initialization

Returns:

ts_agg – aggregated time series, they all must have the same length otherwise it can not work each column of this DataFrame will be a layer in the image

Return type:

pandas.DataFrame

repurpose package¶

Submodules¶

repurpose.img2ts module¶

repurpose.resample module¶

repurpose.ts2img module¶

Module contents¶

repurpose

Navigation

Related Topics