mapreader.classify.datasets
Module Contents
Classes
An abstract class representing a |
|
An abstract class representing a |
Attributes
- mapreader.classify.datasets.parhugin_installed = True
- class mapreader.classify.datasets.PatchDataset(patch_df, transform, delimiter=',', patch_paths_col='image_path', label_col=None, label_index_col=None, image_mode='RGB')
Bases:
torch.utils.data.Dataset
An abstract class representing a
Dataset
.All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite
__getitem__()
, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite__len__()
, which is expected to return the size of the dataset by manySampler
implementations and the default options ofDataLoader
.Note
DataLoader
by default constructs a index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.- Parameters:
patch_df (Union[pandas.DataFrame, str]) –
transform (Union[str, torchvision.transforms.Compose, Callable]) –
delimiter (str) –
patch_paths_col (Optional[str]) –
label_col (Optional[str]) –
label_index_col (Optional[str]) –
image_mode (Optional[str]) –
- return_orig_image(idx)
Return the original image associated with the given index.
- Parameters:
idx (int or Tensor) – The index of the desired image, or a Tensor containing the index.
- Returns:
The original image associated with the given index.
- Return type:
PIL.Image.Image
Notes
This method returns the original image associated with the given index by loading the image file using the file path stored in the
patch_paths_col
column of thepatch_df
DataFrame at the given index. The loaded image is then converted to the format specified by theimage_mode
attribute of the object. The resultingPIL.Image.Image
object is returned.
- class mapreader.classify.datasets.PatchContextDataset(patch_df, transform1, transform2, delimiter=',', patch_paths_col='image_path', label_col=None, label_index_col=None, image_mode='RGB', context_save_path='./maps/maps_context', create_context=False, parent_path='./maps', x_offset=1.0, y_offset=1.0, slice_method='scale')
Bases:
PatchDataset
An abstract class representing a
Dataset
.All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite
__getitem__()
, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite__len__()
, which is expected to return the size of the dataset by manySampler
implementations and the default options ofDataLoader
.Note
DataLoader
by default constructs a index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.- Parameters:
patch_df (Union[pandas.DataFrame, str]) –
transform1 (str) –
transform2 (str) –
delimiter (str) –
patch_paths_col (Optional[str]) –
label_col (Optional[str]) –
label_index_col (Optional[str]) –
image_mode (Optional[str]) –
context_save_path (Optional[str]) –
create_context (Optional[bool]) –
parent_path (Optional[str]) –
x_offset (Optional[float]) –
y_offset (Optional[float]) –
slice_method (Optional[str]) –
- save_parents(processors=10, sleep_time=0.001, use_parhugin=True, parent_delimiter='#', loc_delimiter='-', overwrite=False)
Save parent patches for all patches in the patch_df.
- Parameters:
processors (int, optional) – The number of required processors for the job, by default 10.
sleep_time (float, optional) – The time to wait between jobs, by default 0.001.
use_parhugin (bool, optional) – Flag indicating whether to use Parhugin to parallelize the job, by default True.
parent_delimiter (str, optional) – The delimiter used to separate parent IDs in the patch filename, by default “#”.
loc_delimiter (str, optional) – The delimiter used to separate patch pixel bounds in the patch filename, by default “-“.
overwrite (bool, optional) – Flag indicating whether to overwrite existing parent files, by default False.
- Return type:
None
Notes
Parhugin is a Python package for parallelizing computations across multiple CPU cores. The method uses Parhugin to parallelize the computation of saving parent patches to disk. When Parhugin is installed and
use_parhugin
is set to True, the method parallelizes the calling of thesave_parents_idx
method and its corresponding arguments. If Parhugin is not installed oruse_parhugin
is set to False, the method executes the loop over patch indices sequentially instead.
- save_parents_idx(idx, parent_delimiter='#', loc_delimiter='-', overwrite=False, return_image=False)
Save the parents of a specific patch to the specified location.
- Parameters:
idx (int) – Index of the patch in the dataset.
parent_delimiter (str, optional) – Delimiter to split the parent names in the file path. Default is “#”.
loc_delimiter (str, optional) – Delimiter to split the location of the patch in the file path. Default is “-“.
overwrite (bool, optional) – Whether to overwrite the existing parent files. Default is False.
return_image (Optional[bool]) –
- Raises:
ValueError – If the patch is not found in the dataset.
- Return type:
None
- plot_sample(idx)
Plot a sample patch and its corresponding context from the dataset.
- Parameters:
idx (int) – The index of the sample to plot.
- Returns:
Displays the plot of the sample patch and its corresponding context.
- Return type:
None
Notes
This method plots a sample patch and its corresponding context side-by- side in a single figure with two subplots. The figure size is set to 10in x 5in, and the titles of the subplots are set to “Patch” and “Context”, respectively. The resulting figure is displayed using the
matplotlib
library (required).