mapreader.classify.load_annotations

Module Contents

Classes

AnnotationsLoader

class mapreader.classify.load_annotations.AnnotationsLoader
load(annotations, delimiter=',', id_col='image_id', patch_paths_col='image_path', label_col='label', append=True, scramble_frame=False, reset_index=False)

Loads annotations from a csv file or dataframe and can be used to set the id_col, patch_paths_col and label_col attributes.

Parameters:
  • annotations (Union[str, pd.DataFrame]) – The annotations. Can either be the path to a csv file or a pandas.DataFrame.

  • delimiter (Optional[str], optional) – The delimiter to use when loading the csv file as a dataframe, by default “,”.

  • id_col (Optional[str], optional) – The name of the column which contains the image IDs, by default “image_id”.

  • patch_paths_col (Optional[str], optional) – The name of the column containing the image paths, by default “image_path”.

  • label_col (Optional[str], optional) – The name of the column containing the image labels, by default “label”.

  • append (Optional[bool], optional) – Whether to append the annotations to a pre-existing annotations dataframe. If False, existing dataframe will be overwritten. By default True.

  • scramble_frame (Optional[bool], optional) – Whether to shuffle the rows of the dataframe, by default False.

  • reset_index (Optional[bool], optional) – Whether to reset the index of the dataframe (e.g. after shuffling), by default False.

Raises:

ValueError – If annotations is passed as something other than a string or pd.DataFrame.

show_patch(patch_id)

Display a patch and its label.

Parameters:

patch_id (str) – The image ID of the patch to show.

Return type:

None

print_unique_labels()

Prints unique labels

Raises:

ValueError – If no annotations are found.

Return type:

None

review_labels(label_to_review=None, chunks=8 * 3, num_cols=8, exclude_df=None, include_df=None, deduplicate_col='image_id')

Perform image review on annotations and update labels for a given label or all labels.

Parameters:
  • label_to_review (str, optional) – The target label to review. If not provided, all labels will be reviewed, by default None.

  • chunks (int, optional) – The number of images to display at a time, by default 24.

  • num_cols (int, optional) – The number of columns in the display, by default 8.

  • exclude_df (pandas.DataFrame, optional) – A DataFrame of images to exclude from review, by default None.

  • include_df (pandas.DataFrame, optional) – A DataFrame of images to include for review, by default None.

  • deduplicate_col (str, optional) – The column to use for deduplicating reviewed images, by default "image_id".

Return type:

None

Notes

This method reviews images with their corresponding labels and allows the user to change the label for each image.

Updated labels are saved in self.annotations and in a newly created self.reviewed DataFrame. If exclude_df is provided, images found in this df are skipped in the review process. If include_df is provided, only images found in this df are reviewed. The self.reviewed DataFrame is deduplicated based on the deduplicate_col.

show_sample(label_to_show, num_samples=9)

Show a random sample of images with the specified label (tar_label).

Parameters:
  • label_to_show (str, optional) – The label of the images to show.

  • num_sample (int, optional) – The number of images to show. If None, all images with the specified label will be shown. Default is 9.

  • num_samples (Optional[int]) –

Return type:

None

create_datasets(frac_train=0.7, frac_val=0.15, frac_test=0.15, random_state=1364, train_transform='train', val_transform='val', test_transform='test')

Splits the dataset into three subsets: training, validation, and test sets (DataFrames) and saves them as a dictionary in self.datasets.

Parameters:
  • frac_train (float, optional) – Fraction of the dataset to be used for training. By default 0.70.

  • frac_val (float, optional) – Fraction of the dataset to be used for validation. By default 0.15.

  • frac_test (float, optional) – Fraction of the dataset to be used for testing. By default 0.15.

  • random_state (int, optional) – Random seed to ensure reproducibility. The default is 1364.

  • train_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the training dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “train”.

  • val_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the validation dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “val”.

  • test_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the test dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “test”.

Raises:

ValueError – If the sum of fractions of training, validation and test sets does not add up to 1.

Return type:

None

Notes

This method saves the split datasets as a dictionary in self.datasets.

Following fractional ratios provided by the user, where each subset is stratified by the values in a specific column (that is, each subset has the same relative frequency of the values in the column). It performs this splitting by running train_test_split() twice.

See PatchDataset for more information on transforms.

create_dataloaders(batch_size=16, sampler='default', shuffle=False, num_workers=0, **kwargs)

Creates a dictionary containing PyTorch dataloaders saves it to as self.dataloaders and returns it.

Parameters:
  • batch_size (int, optional) – The batch size to use for the dataloader. By default 16.

  • sampler (Sampler, str or None, optional) – The sampler to use when creating batches from the training dataset.

  • shuffle (bool, optional) – Whether to shuffle the dataset during training. By default False.

  • num_workers (int, optional) – The number of worker threads to use for loading data. By default 0.

  • **kwds – Additional keyword arguments to pass to PyTorch’s DataLoader constructor.

Returns:

Dictionary containing dataloaders.

Return type:

Dict

Notes

sampler will only be applied to the training dataset (datasets[“train”]).