mapreader.classify.load_annotations

Module Contents

Classes

AnnotationsLoader

class mapreader.classify.load_annotations.AnnotationsLoader
load(annotations, delimiter=',', images_dir=None, remove_broken=True, ignore_broken=False, patch_paths_col='image_path', label_col='label', append=True, scramble_frame=False, reset_index=False)

Loads annotations from a csv file or dataframe and can be used to set the patch_paths_col and label_col attributes.

Parameters:
  • annotations (Union[str, pd.DataFrame]) – The annotations. Can either be the path to a csv file or a pandas.DataFrame.

  • delimiter (Optional[str], optional) – The delimiter to use when loading the csv file as a dataframe, by default “,”.

  • images_dir (Optional[str], optional) – The path to the directory in which patches are stored. This argument should be passed if image paths are different from the path saved in annotations dataframe/csv. If None, no updates will be made to the image paths in the annotations dataframe/csv. By default None.

  • remove_broken (Optional[bool], optional) – Whether to remove annotations with broken image paths. If False, annotations with broken paths will remain in annotations dataframe and may cause issues! By default True.

  • ignore_broken (Optional[bool], optional) – Whether to ignore broken image paths (only valid if remove_broken=False). If True, annotations with broken paths will remain in annotations dataframe and no error will be raised. This may cause issues! If False, annotations with broken paths will raise error. By default, False.

  • patch_paths_col (Optional[str], optional) – The name of the column containing the image paths, by default “image_path”.

  • label_col (Optional[str], optional) – The name of the column containing the image labels, by default “label”.

  • append (Optional[bool], optional) – Whether to append the annotations to a pre-existing annotations dataframe. If False, existing dataframe will be overwritten. By default True.

  • scramble_frame (Optional[bool], optional) – Whether to shuffle the rows of the dataframe, by default False.

  • reset_index (Optional[bool], optional) – Whether to reset the index of the dataframe (e.g. after shuffling), by default False.

Raises:

ValueError – If annotations is passed as something other than a string or pd.DataFrame.

show_patch(patch_id)

Display a patch and its label.

Parameters:

patch_id (str) – The image ID of the patch to show.

Return type:

None

print_unique_labels()

Prints unique labels

Raises:

ValueError – If no annotations are found.

Return type:

None

review_labels(label_to_review=None, chunks=8 * 3, num_cols=8, exclude_df=None, include_df=None, deduplicate_col='image_id')

Perform image review on annotations and update labels for a given label or all labels.

Parameters:
  • label_to_review (str, optional) – The target label to review. If not provided, all labels will be reviewed, by default None.

  • chunks (int, optional) – The number of images to display at a time, by default 24.

  • num_cols (int, optional) – The number of columns in the display, by default 8.

  • exclude_df (pandas.DataFrame, optional) – A DataFrame of images to exclude from review, by default None.

  • include_df (pandas.DataFrame, optional) – A DataFrame of images to include for review, by default None.

  • deduplicate_col (str, optional) – The column to use for deduplicating reviewed images, by default "image_id".

Return type:

None

Notes

This method reviews images with their corresponding labels and allows the user to change the label for each image.

Updated labels are saved in self.annotations and in a newly created self.reviewed DataFrame. If exclude_df is provided, images found in this df are skipped in the review process. If include_df is provided, only images found in this df are reviewed. The self.reviewed DataFrame is deduplicated based on the deduplicate_col.

show_sample(label_to_show, num_samples=9)

Show a random sample of images with the specified label (tar_label).

Parameters:
  • label_to_show (str, optional) – The label of the images to show.

  • num_sample (int, optional) – The number of images to show. If None, all images with the specified label will be shown. Default is 9.

  • num_samples (int | None)

Return type:

None

create_datasets(frac_train=0.7, frac_val=0.15, frac_test=0.15, random_state=1364, train_transform='train', val_transform='val', test_transform='test', context_datasets=False, context_df=None)

Splits the dataset into three subsets: training, validation, and test sets (DataFrames) and saves them as a dictionary in self.datasets.

Parameters:
  • frac_train (float, optional) – Fraction of the dataset to be used for training. By default 0.70.

  • frac_val (float, optional) – Fraction of the dataset to be used for validation. By default 0.15.

  • frac_test (float, optional) – Fraction of the dataset to be used for testing. By default 0.15.

  • random_state (int, optional) – Random seed to ensure reproducibility. The default is 1364.

  • train_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the training dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “train”.

  • val_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the validation dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “val”.

  • test_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the test dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “test”.

  • context_datasets (bool, optional) – Whether to create context datasets or not. By default False.

  • context_df (str or pandas.DataFrame, optional) – The dataframe containing all patches if using context datasets. Used to create context images. By default None.

Raises:

ValueError – If the sum of fractions of training, validation and test sets does not add up to 1.

Return type:

None

Notes

This method saves the split datasets as a dictionary in self.datasets.

Following fractional ratios provided by the user, where each subset is stratified by the values in a specific column (that is, each subset has the same relative frequency of the values in the column). It performs this splitting by running train_test_split() twice.

See PatchDataset for more information on transforms.

create_patch_datasets(train_transform, val_transform, test_transform, df_train, df_val, df_test)
create_patch_context_datasets(context_df, train_transform, val_transform, test_transform, df_train, df_val, df_test)
create_dataloaders(batch_size=16, sampler='default', shuffle=False, num_workers=0, **kwargs)

Creates a dictionary containing PyTorch dataloaders saves it to as self.dataloaders and returns it.

Parameters:
  • batch_size (int, optional) – The batch size to use for the dataloader. By default 16.

  • sampler (Sampler, str or None, optional) – The sampler to use when creating batches from the training dataset.

  • shuffle (bool, optional) – Whether to shuffle the dataset during training. By default False.

  • num_workers (int, optional) – The number of worker threads to use for loading data. By default 0.

  • **kwds – Additional keyword arguments to pass to PyTorch’s DataLoader constructor.

Returns:

Dictionary containing dataloaders.

Return type:

Dict

Notes

sampler will only be applied to the training dataset (datasets[“train”]).