mapreader.classify.load_annotations
Module Contents
Classes
- class mapreader.classify.load_annotations.AnnotationsLoader
- load(annotations, delimiter=',', images_dir=None, remove_broken=True, ignore_broken=False, patch_paths_col='image_path', label_col='label', append=True, scramble_frame=False, reset_index=False)
Loads annotations from a csv file or dataframe and can be used to set the
patch_paths_col
andlabel_col
attributes.- Parameters:
annotations (Union[str, pd.DataFrame]) – The annotations. Can either be the path to a csv file or a pandas.DataFrame.
delimiter (Optional[str], optional) – The delimiter to use when loading the csv file as a dataframe, by default “,”.
images_dir (Optional[str], optional) – The path to the directory in which patches are stored. This argument should be passed if image paths are different from the path saved in annotations dataframe/csv. If None, no updates will be made to the image paths in the annotations dataframe/csv. By default None.
remove_broken (Optional[bool], optional) – Whether to remove annotations with broken image paths. If False, annotations with broken paths will remain in annotations dataframe and may cause issues! By default True.
ignore_broken (Optional[bool], optional) – Whether to ignore broken image paths (only valid if remove_broken=False). If True, annotations with broken paths will remain in annotations dataframe and no error will be raised. This may cause issues! If False, annotations with broken paths will raise error. By default, False.
patch_paths_col (Optional[str], optional) – The name of the column containing the image paths, by default “image_path”.
label_col (Optional[str], optional) – The name of the column containing the image labels, by default “label”.
append (Optional[bool], optional) – Whether to append the annotations to a pre-existing
annotations
dataframe. If False, existing dataframe will be overwritten. By default True.scramble_frame (Optional[bool], optional) – Whether to shuffle the rows of the dataframe, by default False.
reset_index (Optional[bool], optional) – Whether to reset the index of the dataframe (e.g. after shuffling), by default False.
- Raises:
ValueError – If
annotations
is passed as something other than a string or pd.DataFrame.
- show_patch(patch_id)
Display a patch and its label.
- Parameters:
patch_id (str) – The image ID of the patch to show.
- Return type:
None
- print_unique_labels()
Prints unique labels
- Raises:
ValueError – If no annotations are found.
- Return type:
None
- review_labels(label_to_review=None, chunks=8 * 3, num_cols=8, exclude_df=None, include_df=None, deduplicate_col='image_id')
Perform image review on annotations and update labels for a given label or all labels.
- Parameters:
label_to_review (str, optional) – The target label to review. If not provided, all labels will be reviewed, by default
None
.chunks (int, optional) – The number of images to display at a time, by default
24
.num_cols (int, optional) – The number of columns in the display, by default
8
.exclude_df (pandas.DataFrame, optional) – A DataFrame of images to exclude from review, by default
None
.include_df (pandas.DataFrame, optional) – A DataFrame of images to include for review, by default
None
.deduplicate_col (str, optional) – The column to use for deduplicating reviewed images, by default
"image_id"
.
- Return type:
None
Notes
This method reviews images with their corresponding labels and allows the user to change the label for each image.
Updated labels are saved in
self.annotations
and in a newly createdself.reviewed
DataFrame. Ifexclude_df
is provided, images found in this df are skipped in the review process. Ifinclude_df
is provided, only images found in this df are reviewed. Theself.reviewed
DataFrame is deduplicated based on thededuplicate_col
.
- show_sample(label_to_show, num_samples=9)
Show a random sample of images with the specified label (tar_label).
- Parameters:
label_to_show (str, optional) – The label of the images to show.
num_sample (int, optional) – The number of images to show. If
None
, all images with the specified label will be shown. Default is9
.num_samples (int | None)
- Return type:
None
- create_datasets(frac_train=0.7, frac_val=0.15, frac_test=0.15, random_state=1364, train_transform='train', val_transform='val', test_transform='test', context_datasets=False, context_df=None)
Splits the dataset into three subsets: training, validation, and test sets (DataFrames) and saves them as a dictionary in
self.datasets
.- Parameters:
frac_train (float, optional) – Fraction of the dataset to be used for training. By default
0.70
.frac_val (float, optional) – Fraction of the dataset to be used for validation. By default
0.15
.frac_test (float, optional) – Fraction of the dataset to be used for testing. By default
0.15
.random_state (int, optional) – Random seed to ensure reproducibility. The default is
1364
.train_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the training dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “train”.
val_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the validation dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “val”.
test_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the test dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “test”.
context_datasets (bool, optional) – Whether to create context datasets or not. By default False.
context_df (str or pandas.DataFrame, optional) – The dataframe containing all patches if using context datasets. Used to create context images. By default None.
- Raises:
ValueError – If the sum of fractions of training, validation and test sets does not add up to 1.
- Return type:
None
Notes
This method saves the split datasets as a dictionary in
self.datasets
.Following fractional ratios provided by the user, where each subset is stratified by the values in a specific column (that is, each subset has the same relative frequency of the values in the column). It performs this splitting by running
train_test_split()
twice.See
PatchDataset
for more information on transforms.
- create_patch_datasets(train_transform, val_transform, test_transform, df_train, df_val, df_test)
- create_patch_context_datasets(context_df, train_transform, val_transform, test_transform, df_train, df_val, df_test)
- create_dataloaders(batch_size=16, sampler='default', shuffle=False, num_workers=0, **kwargs)
Creates a dictionary containing PyTorch dataloaders saves it to as
self.dataloaders
and returns it.- Parameters:
batch_size (int, optional) – The batch size to use for the dataloader. By default
16
.sampler (Sampler, str or None, optional) – The sampler to use when creating batches from the training dataset.
shuffle (bool, optional) – Whether to shuffle the dataset during training. By default
False
.num_workers (int, optional) – The number of worker threads to use for loading data. By default
0
.**kwds – Additional keyword arguments to pass to PyTorch’s
DataLoader
constructor.
- Returns:
Dictionary containing dataloaders.
- Return type:
Dict
Notes
sampler
will only be applied to the training dataset (datasets[“train”]).