mapreader.classify.load_annotations
Classes
A class for loading annotations and preparing datasets and dataloaders for |
Module Contents
- class mapreader.classify.load_annotations.AnnotationsLoader
A class for loading annotations and preparing datasets and dataloaders for use in training/validation of a model.
- annotations
- labels_map
- reviewed
- patch_paths_col = None
- label_col = None
- datasets = None
- load(annotations, labels_map=None, delimiter=',', images_dir=None, remove_broken=True, ignore_broken=False, patch_paths_col='image_path', label_col='label', append=True, scramble_frame=False, reset_index=False)
Loads annotations from a CSV/TSV/geojson file, a pandas DataFrame or a geopandas GeoDataFrame. Sets the
patch_paths_colandlabel_colattributes.- Parameters:
annotations (str | pathlib.Path | pd.DataFrame | gpd.GeoDataFrame) – The annotations. Can either be the path to a CSV/TSV/geojson file, a pandas DataFrame or a geopandas GeoDataFrame.
labels_map (Optional[dict], optional) – A dictionary mapping labels to indices. If not provided, labels will be mapped to indices based on the order in which they appear in the annotations dataframe. By default None.
delimiter (str, optional) – The delimiter to use when loading the csv file as a DataFrame, by default “,”.
images_dir (str, optional) – The path to the directory in which patches are stored. This argument should be passed if image paths are different from the path saved in existing annotations. If None, no updates will be made to the image paths in the annotations DataFrame/csv. By default None.
remove_broken (bool, optional) – Whether to remove annotations with broken image paths. If False, annotations with broken paths will remain in annotations DataFrame and may cause issues! By default True.
ignore_broken (bool, optional) – Whether to ignore broken image paths (only valid if remove_broken=False). If True, annotations with broken paths will remain in annotations DataFrame and no error will be raised. This may cause issues! If False, annotations with broken paths will raise error. By default, False.
patch_paths_col (str, optional) – The name of the column containing the image paths, by default “image_path”.
label_col (str, optional) – The name of the column containing the image labels, by default “label”.
append (bool, optional) – Whether to append the annotations to a pre-existing
annotationsDataFrame. If False, existing DataFrame will be overwritten. By default True.scramble_frame (bool, optional) – Whether to shuffle the rows of the DataFrame, by default False.
reset_index (bool, optional) – Whether to reset the index of the DataFrame (e.g. after shuffling), by default False.
- Raises:
ValueError – If
annotationsis passed as something other than a string or pd.DataFrame.
- show_patch(patch_id)
Display a patch and its label.
- Parameters:
patch_id (str) – The image ID of the patch to show.
- Return type:
None
- print_unique_labels()
Prints unique labels
- Raises:
ValueError – If no annotations are found.
- Return type:
None
- review_labels(label_to_review=None, chunks=8 * 3, num_cols=8, exclude_df=None, include_df=None, deduplicate_col='image_id')
Perform image review on annotations and update labels for a given label or all labels.
- Parameters:
label_to_review (str, optional) – The target label to review. If not provided, all labels will be reviewed, by default
None.chunks (int, optional) – The number of images to display at a time, by default
24.num_cols (int, optional) – The number of columns in the display, by default
8.exclude_df (pandas.DataFrame or gpd.GeoDataFrame or None, optional) – A DataFrame of images to exclude from review, by default
None.include_df (pandas.DataFrame or gpd.GeoDataFrame or None, optional) – A DataFrame of images to include for review, by default
None.deduplicate_col (str, optional) – The column to use for deduplicating reviewed images, by default
"image_id".
- Return type:
None
Notes
This method reviews images with their corresponding labels and allows the user to change the label for each image.
Updated labels are saved in
annotationsand in a newly createdreviewedDataFrame.If
exclude_dfis provided, images found in this df are skipped in the review process.If
include_dfis provided, only images found in this df are reviewed.The
reviewedDataFrame is deduplicated based on thededuplicate_col.
- show_sample(label_to_show, num_samples=9)
Show a random sample of images with the specified label (tar_label).
- Parameters:
label_to_show (str, optional) – The label of the images to show.
num_sample (int, optional) – The number of images to show. If
None, all images with the specified label will be shown. Default is9.num_samples (int | None)
- Return type:
None
- create_datasets(frac_train=0.7, frac_val=0.15, frac_test=0.15, random_state=1364, train_transform='train', val_transform='val', test_transform='test', context_datasets=False, context_df=None)
Splits the dataset into three subsets: training, validation, and test sets (DataFrames) and saves them as a dictionary in
self.datasets.- Parameters:
frac_train (float, optional) – Fraction of the dataset to be used for training. By default
0.70.frac_val (float, optional) – Fraction of the dataset to be used for validation. By default
0.15.frac_test (float, optional) – Fraction of the dataset to be used for testing. By default
0.15.random_state (int, optional) – Random seed to ensure reproducibility. The default is
1364.train_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the training dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “train”.
val_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the validation dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “val”.
test_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the test dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “test”.
context_datasets (bool, optional) – Whether to create context datasets or not. By default False.
context_df (str or or pathlib.Path or pandas.DataFrame or gpd.GeoDataFrame, optional) – The DataFrame containing all patches if using context datasets. Used to create context images. By default None.
- Raises:
ValueError – If the sum of fractions of training, validation and test sets does not add up to 1.
- Return type:
None
Notes
This method saves the split datasets as a dictionary in
self.datasets.Following fractional ratios provided by the user, where each subset is stratified by the values in a specific column (that is, each subset has the same relative frequency of the values in the column). It performs this splitting by running
train_test_split()twice.See
PatchDatasetfor more information on transforms.
- create_patch_datasets(train_transform, val_transform, test_transform, df_train, df_val, df_test)
- create_patch_context_datasets(context_df, train_transform, val_transform, test_transform, df_train, df_val, df_test)
- create_dataloaders(batch_size=16, sampler='default', shuffle=False, num_workers=0, **kwargs)
Creates a dictionary containing PyTorch dataloaders saves it to as
self.dataloadersand returns it.- Parameters:
batch_size (int, optional) – The batch size to use for the dataloader. By default
16.sampler (Sampler, str or None, optional) – The sampler to use when creating batches from the training dataset.
shuffle (bool, optional) – Whether to shuffle the dataset during training. By default
False.num_workers (int, optional) – The number of worker threads to use for loading data. By default
0.**kwds – Additional keyword arguments to pass to PyTorch’s
DataLoaderconstructor.
- Returns:
Dictionary containing dataloaders.
- Return type:
Dict
Notes
samplerwill only be applied to the training dataset (datasets[“train”]).