mapreader.classify.datasets
Attributes
Classes
A PyTorch Dataset class for loading image patches from a DataFrame. |
|
A PyTorch Dataset class for loading contextual information about image |
Module Contents
- mapreader.classify.datasets.parhugin_installed = True
- class mapreader.classify.datasets.PatchDataset(patch_df, transform, delimiter=',', patch_paths_col='image_path', label_col=None, label_index_col=None, image_mode='RGB')
Bases:
torch.utils.data.DatasetA PyTorch Dataset class for loading image patches from a DataFrame.
- Parameters:
patch_df (str or pathlib.Path or pandas.DataFrame or gpd.GeoDataFrame) – DataFrame or path to CSV/TSV/geojson file containing the paths to image patches and their labels.
transform (Union[str, transforms.Compose, Callable]) – The transform to use on the image. A string can be used to call default transforms - options are “train”, “test” or “val”. Alternatively, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose) that takes in an image and performs image transformations can be used. At minimum, transform should be
torchvision.transforms.ToTensor().delimiter (str, optional) – The delimiter to use when reading the CSV/TSV file. By default
",".patch_paths_col (str, optional) – The name of the column in the DataFrame containing the image paths. Default is “image_path”.
label_col (str, optional) – The name of the column containing the image labels. Default is None.
label_index_col (str, optional) – The name of the column containing the indices of the image labels. Default is None.
image_mode (str, optional) – The color format to convert the image to. Default is “RGB”.
- patch_df
DataFrame containing the paths to image patches and their labels.
- Type:
pandas.DataFrame or gpd.GeoDataFrame
- label_col
The name of the column containing the image labels.
- Type:
str
- label_index_col
The name of the column containing the labels indices.
- Type:
str
- patch_paths_col
The name of the column in the DataFrame containing the image paths.
- Type:
str
- image_mode
The color format to convert the image to.
- Type:
str
- unique_labels
The unique labels in the label column of the patch_df DataFrame.
- Type:
list
- transform
A callable object (a torchvision transform) that takes in an image and performs image transformations.
- Type:
callable
- __len__()
Returns the length of the dataset.
- Return type:
int
- __getitem__(idx)
Retrieves the image, its label and the index of that label at the given index in the dataset.
- Parameters:
idx (int | torch.Tensor)
- Return type:
tuple[tuple[torch.Tensor], str, int]
- return_orig_image(idx)
Retrieves the original image at the given index in the dataset.
- Parameters:
idx (int | torch.Tensor)
- Return type:
PIL.Image
- _default_transform(t_type, resize2)
Returns a transforms.Compose containing the default image transformations for the train and validation sets.
- Parameters:
t_type (str | None)
resize (int | tuple[int, int] | None)
- Return type:
torchvision.transforms.Compose
- Raises:
ValueError – If
label_colnot inpatch_df.ValueError – If
label_index_colnot inpatch_df.ValueError – If
transformpassed as a string, but not one of “train”, “test” or “val”.
- Parameters:
patch_df (str | pathlib.Path | pandas.DataFrame | geopandas.GeoDataFrame)
transform (str | torchvision.transforms.Compose | Callable)
delimiter (str)
patch_paths_col (str | None)
label_col (str | None)
label_index_col (str | None)
image_mode (str | None)
- label_col = None
- label_index_col = None
- image_mode = 'RGB'
- patch_paths_col = 'image_path'
- unique_labels = []
- return_orig_image(idx)
Return the original image associated with the given index.
- Parameters:
idx (int or Tensor) – The index of the desired image, or a Tensor containing the index.
- Returns:
The original image associated with the given index.
- Return type:
PIL.Image.Image
Notes
This method returns the original image associated with the given index by loading the image file using the file path stored in the
patch_paths_colcolumn of thepatch_dfDataFrame at the given index. The loaded image is then converted to the format specified by theimage_modeattribute of the object. The resultingPIL.Image.Imageobject is returned.
- create_dataloaders(set_name='infer', batch_size=16, shuffle=False, num_workers=0, **kwargs)
Creates a dictionary containing a PyTorch dataloader.
- Parameters:
set_name (str, optional) – The name to use for the dataloader.
batch_size (int, optional) – The batch size to use for the dataloader. By default
16.shuffle (bool, optional) – Whether to shuffle the PatchDataset, by default False
num_workers (int, optional) – The number of worker threads to use for loading data. By default
0.**kwargs – Additional keyword arguments to pass to PyTorch’s
DataLoaderconstructor.
- Returns:
Dictionary containing dataloaders.
- Return type:
Dict
- class mapreader.classify.datasets.PatchContextDataset(patch_df, total_df, transform, delimiter=',', patch_paths_col='image_path', label_col=None, label_index_col=None, image_mode='RGB', context_dir='./maps/maps_context', create_context=False, parent_path='./maps')
Bases:
PatchDatasetA PyTorch Dataset class for loading contextual information about image patches from a DataFrame.
- Parameters:
patch_df (str or pathlib.Path or pandas.DataFrame or gpd.GeoDataFrame) – DataFrame or path to CSV/TSV/geojson file containing the paths to image patches and their labels.
total_df (str or pathlib.Path or pandas.DataFrame or gpd.GeoDataFrame) – DataFrame or path to CSV/TSV/geojson file containing the paths to all images and their labels.
transform (str) – Torchvision transform to be applied to context images. Either “train” or “val”.
delimiter (str) – The delimiter to use when reading the CSV/TSV file. By default
",".patch_paths_col (str, optional) – The name of the column in the DataFrame containing the image paths. Default is “image_path”.
label_col (str, optional) – The name of the column containing the image labels. Default is None.
label_index_col (str, optional) – The name of the column containing the indices of the image labels. Default is None.
image_mode (str, optional) – The color space of the images. Default is “RGB”.
context_dir (str, optional) – The path to context maps (or, where to save context if not created yet). Default is “./maps/maps_context”.
create_context (bool, optional) – Whether or not to create context maps. Default is False.
parent_path (str, optional) – The path to the directory containing parent images. Default is “./maps”.
- patch_df
DataFrame with columns representing image paths, labels, and object bounding boxes.
- Type:
pandas.DataFrame or gpd.GeoDataFrame
- label_col
The name of the column containing the image labels.
- Type:
str
- label_index_col
The name of the column containing the labels indices.
- Type:
str
- patch_paths_col
The name of the column in the DataFrame containing the image paths.
- Type:
str
- image_mode
The color space of the images.
- Type:
str
- parent_path
The path to the directory containing parent images.
- Type:
str
- create_context
Whether or not to create context maps.
- Type:
bool
- context_dir
The path to context maps.
- Type:
str
- unique_labels
The unique labels in
label_col.- Type:
list or str
- label_col = None
- label_index_col = None
- image_mode = 'RGB'
- patch_paths_col = 'image_path'
- parent_path = './maps'
- create_context = False
- context_dir = b'.'
- save_context(processors=10, sleep_time=0.001, use_parhugin=True, overwrite=False)
Save context images for all patches in the patch_df.
- Parameters:
processors (int, optional) – The number of required processors for the job, by default 10.
sleep_time (float, optional) – The time to wait between jobs, by default 0.001.
use_parhugin (bool, optional) – Whether to use Parhugin to parallelize the job, by default True.
overwrite (bool, optional) – Whether to overwrite existing parent files, by default False.
- Return type:
None
Notes
Parhugin is a Python package for parallelizing computations across multiple CPU cores. The method uses Parhugin to parallelize the computation of saving parent patches to disk. When Parhugin is installed and
use_parhuginis set to True, the method parallelizes the calling of theget_context_idmethod and its corresponding arguments. If Parhugin is not installed oruse_parhuginis set to False, the method executes the loop over patch indices sequentially instead.
- get_context_id(id, overwrite=False, save_context=False, return_image=True)
Save the parents of a specific patch to the specified location.
- Parameters:
id – Index of the patch in the dataset.
overwrite (bool, optional) – Whether to overwrite the existing parent files. Default is False.
save_context (bool, optional) – Whether to save the context image. Default is False.
return_image (bool, optional) – Whether to return the context image. Default is True.
- Raises:
ValueError – If the patch is not found in the dataset.
- Return type:
None
- plot_sample(idx)
Plot a sample patch and its corresponding context from the dataset.
- Parameters:
idx (int) – The index of the sample to plot.
- Returns:
Displays the plot of the sample patch and its corresponding context.
- Return type:
None
Notes
This method plots a sample patch and its corresponding context side-by- side in a single figure with two subplots. The figure size is set to 10in x 5in, and the titles of the subplots are set to “Patch” and “Context”, respectively. The resulting figure is displayed using the
matplotliblibrary (required).