mapreader.classify.load_annotations

Classes

AnnotationsLoader

A class for loading annotations and preparing datasets and dataloaders for

Module Contents

class mapreader.classify.load_annotations.AnnotationsLoader

A class for loading annotations and preparing datasets and dataloaders for use in training/validation of a model.

annotations
labels_map
reviewed
patch_paths_col = None
label_col = None
datasets = None
load(annotations, labels_map=None, delimiter=',', images_dir=None, remove_broken=True, ignore_broken=False, patch_paths_col='image_path', label_col='label', append=True, scramble_frame=False, reset_index=False)

Loads annotations from a CSV/TSV/geojson file, a pandas DataFrame or a geopandas GeoDataFrame. Sets the patch_paths_col and label_col attributes.

Parameters:
  • annotations (str | pathlib.Path | pd.DataFrame | gpd.GeoDataFrame) – The annotations. Can either be the path to a CSV/TSV/geojson file, a pandas DataFrame or a geopandas GeoDataFrame.

  • labels_map (Optional[dict], optional) – A dictionary mapping labels to indices. If not provided, labels will be mapped to indices based on the order in which they appear in the annotations dataframe. By default None.

  • delimiter (str, optional) – The delimiter to use when loading the csv file as a DataFrame, by default “,”.

  • images_dir (str, optional) – The path to the directory in which patches are stored. This argument should be passed if image paths are different from the path saved in existing annotations. If None, no updates will be made to the image paths in the annotations DataFrame/csv. By default None.

  • remove_broken (bool, optional) – Whether to remove annotations with broken image paths. If False, annotations with broken paths will remain in annotations DataFrame and may cause issues! By default True.

  • ignore_broken (bool, optional) – Whether to ignore broken image paths (only valid if remove_broken=False). If True, annotations with broken paths will remain in annotations DataFrame and no error will be raised. This may cause issues! If False, annotations with broken paths will raise error. By default, False.

  • patch_paths_col (str, optional) – The name of the column containing the image paths, by default “image_path”.

  • label_col (str, optional) – The name of the column containing the image labels, by default “label”.

  • append (bool, optional) – Whether to append the annotations to a pre-existing annotations DataFrame. If False, existing DataFrame will be overwritten. By default True.

  • scramble_frame (bool, optional) – Whether to shuffle the rows of the DataFrame, by default False.

  • reset_index (bool, optional) – Whether to reset the index of the DataFrame (e.g. after shuffling), by default False.

Raises:

ValueError – If annotations is passed as something other than a string or pd.DataFrame.

show_patch(patch_id)

Display a patch and its label.

Parameters:

patch_id (str) – The image ID of the patch to show.

Return type:

None

print_unique_labels()

Prints unique labels

Raises:

ValueError – If no annotations are found.

Return type:

None

review_labels(label_to_review=None, chunks=8 * 3, num_cols=8, exclude_df=None, include_df=None, deduplicate_col='image_id')

Perform image review on annotations and update labels for a given label or all labels.

Parameters:
  • label_to_review (str, optional) – The target label to review. If not provided, all labels will be reviewed, by default None.

  • chunks (int, optional) – The number of images to display at a time, by default 24.

  • num_cols (int, optional) – The number of columns in the display, by default 8.

  • exclude_df (pandas.DataFrame or gpd.GeoDataFrame or None, optional) – A DataFrame of images to exclude from review, by default None.

  • include_df (pandas.DataFrame or gpd.GeoDataFrame or None, optional) – A DataFrame of images to include for review, by default None.

  • deduplicate_col (str, optional) – The column to use for deduplicating reviewed images, by default "image_id".

Return type:

None

Notes

This method reviews images with their corresponding labels and allows the user to change the label for each image.

Updated labels are saved in annotations and in a newly created reviewed DataFrame.

If exclude_df is provided, images found in this df are skipped in the review process.

If include_df is provided, only images found in this df are reviewed.

The reviewed DataFrame is deduplicated based on the deduplicate_col.

show_sample(label_to_show, num_samples=9)

Show a random sample of images with the specified label (tar_label).

Parameters:
  • label_to_show (str, optional) – The label of the images to show.

  • num_sample (int, optional) – The number of images to show. If None, all images with the specified label will be shown. Default is 9.

  • num_samples (int | None)

Return type:

None

create_datasets(frac_train=0.7, frac_val=0.15, frac_test=0.15, random_state=1364, train_transform='train', val_transform='val', test_transform='test', context_datasets=False, context_df=None)

Splits the dataset into three subsets: training, validation, and test sets (DataFrames) and saves them as a dictionary in self.datasets.

Parameters:
  • frac_train (float, optional) – Fraction of the dataset to be used for training. By default 0.70.

  • frac_val (float, optional) – Fraction of the dataset to be used for validation. By default 0.15.

  • frac_test (float, optional) – Fraction of the dataset to be used for testing. By default 0.15.

  • random_state (int, optional) – Random seed to ensure reproducibility. The default is 1364.

  • train_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the training dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “train”.

  • val_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the validation dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “val”.

  • test_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the test dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “test”.

  • context_datasets (bool, optional) – Whether to create context datasets or not. By default False.

  • context_df (str or or pathlib.Path or pandas.DataFrame or gpd.GeoDataFrame, optional) – The DataFrame containing all patches if using context datasets. Used to create context images. By default None.

Raises:

ValueError – If the sum of fractions of training, validation and test sets does not add up to 1.

Return type:

None

Notes

This method saves the split datasets as a dictionary in self.datasets.

Following fractional ratios provided by the user, where each subset is stratified by the values in a specific column (that is, each subset has the same relative frequency of the values in the column). It performs this splitting by running train_test_split() twice.

See PatchDataset for more information on transforms.

create_patch_datasets(train_transform, val_transform, test_transform, df_train, df_val, df_test)
create_patch_context_datasets(context_df, train_transform, val_transform, test_transform, df_train, df_val, df_test)
create_dataloaders(batch_size=16, sampler='default', shuffle=False, num_workers=0, **kwargs)

Creates a dictionary containing PyTorch dataloaders saves it to as self.dataloaders and returns it.

Parameters:
  • batch_size (int, optional) – The batch size to use for the dataloader. By default 16.

  • sampler (Sampler, str or None, optional) – The sampler to use when creating batches from the training dataset.

  • shuffle (bool, optional) – Whether to shuffle the dataset during training. By default False.

  • num_workers (int, optional) – The number of worker threads to use for loading data. By default 0.

  • **kwds – Additional keyword arguments to pass to PyTorch’s DataLoader constructor.

Returns:

Dictionary containing dataloaders.

Return type:

Dict

Notes

sampler will only be applied to the training dataset (datasets[“train”]).