mapreader.classify.datasets

Module Contents

Classes

PatchDataset

An abstract class representing a Dataset.

PatchContextDataset

An abstract class representing a Dataset.

Attributes

parhugin_installed

mapreader.classify.datasets.parhugin_installed = True
class mapreader.classify.datasets.PatchDataset(patch_df, transform, delimiter=',', patch_paths_col='image_path', label_col=None, label_index_col=None, image_mode='RGB')

Bases: torch.utils.data.Dataset

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader. Subclasses could also optionally implement __getitems__(), for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

Note

DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

Parameters:
  • patch_df (pandas.DataFrame | str)

  • transform (str | torchvision.transforms.Compose | Callable)

  • delimiter (str)

  • patch_paths_col (str | None)

  • label_col (str | None)

  • label_index_col (str | None)

  • image_mode (str | None)

return_orig_image(idx)

Return the original image associated with the given index.

Parameters:

idx (int or Tensor) – The index of the desired image, or a Tensor containing the index.

Returns:

The original image associated with the given index.

Return type:

PIL.Image.Image

Notes

This method returns the original image associated with the given index by loading the image file using the file path stored in the patch_paths_col column of the patch_df DataFrame at the given index. The loaded image is then converted to the format specified by the image_mode attribute of the object. The resulting PIL.Image.Image object is returned.

create_dataloaders(set_name='infer', batch_size=16, shuffle=False, num_workers=0, **kwargs)

Creates a dictionary containing a PyTorch dataloader.

Parameters:
  • set_name (str, optional) – The name to use for the dataloader.

  • batch_size (int, optional) – The batch size to use for the dataloader. By default 16.

  • shuffle (bool, optional) – Whether to shuffle the PatchDataset, by default False

  • num_workers (int, optional) – The number of worker threads to use for loading data. By default 0.

  • **kwargs – Additional keyword arguments to pass to PyTorch’s DataLoader constructor.

Returns:

Dictionary containing dataloaders.

Return type:

Dict

class mapreader.classify.datasets.PatchContextDataset(patch_df, total_df, transform, delimiter=',', patch_paths_col='image_path', label_col=None, label_index_col=None, image_mode='RGB', context_dir='./maps/maps_context', create_context=False, parent_path='./maps')

Bases: PatchDataset

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader. Subclasses could also optionally implement __getitems__(), for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

Note

DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

Parameters:
  • patch_df (pandas.DataFrame | str)

  • total_df (pandas.DataFrame | str)

  • transform (str)

  • delimiter (str)

  • patch_paths_col (str | None)

  • label_col (str | None)

  • label_index_col (str | None)

  • image_mode (str | None)

  • context_dir (str | None)

  • create_context (bool)

  • parent_path (str | None)

save_context(processors=10, sleep_time=0.001, use_parhugin=True, overwrite=False)

Save context images for all patches in the patch_df.

Parameters:
  • processors (int, optional) – The number of required processors for the job, by default 10.

  • sleep_time (float, optional) – The time to wait between jobs, by default 0.001.

  • use_parhugin (bool, optional) – Whether to use Parhugin to parallelize the job, by default True.

  • overwrite (bool, optional) – Whether to overwrite existing parent files, by default False.

Return type:

None

Notes

Parhugin is a Python package for parallelizing computations across multiple CPU cores. The method uses Parhugin to parallelize the computation of saving parent patches to disk. When Parhugin is installed and use_parhugin is set to True, the method parallelizes the calling of the get_context_id method and its corresponding arguments. If Parhugin is not installed or use_parhugin is set to False, the method executes the loop over patch indices sequentially instead.

get_context_id(id, overwrite=False, save_context=False, return_image=True)

Save the parents of a specific patch to the specified location.

Parameters:
  • id – Index of the patch in the dataset.

  • overwrite (bool, optional) – Whether to overwrite the existing parent files. Default is False.

  • save_context (bool, optional) – Whether to save the context image. Default is False.

  • return_image (bool, optional) – Whether to return the context image. Default is True.

Raises:

ValueError – If the patch is not found in the dataset.

Return type:

None

plot_sample(idx)

Plot a sample patch and its corresponding context from the dataset.

Parameters:

idx (int) – The index of the sample to plot.

Returns:

Displays the plot of the sample patch and its corresponding context.

Return type:

None

Notes

This method plots a sample patch and its corresponding context side-by- side in a single figure with two subplots. The figure size is set to 10in x 5in, and the titles of the subplots are set to “Patch” and “Context”, respectively. The resulting figure is displayed using the matplotlib library (required).