mapreader

Subpackages

Package Contents

Classes

MapImages

Class to manage a collection of image paths and construct image objects.

SheetDownloader

A class to download map sheets using metadata.

Downloader

A class to download maps (without using metadata)

AnnotationsLoader

PatchDataset

An abstract class representing a Dataset.

PatchContextDataset

An abstract class representing a Dataset.

ClassifierContainer

Annotator

Annotator class for annotating patches with labels.

Functions

loader([path_images, tree_level, parent_path])

Creates a MapImages class to manage a collection of image paths and

load_patches(patch_paths[, patch_file_ext, ...])

Creates a MapImages class to manage a collection of image paths and

create_polygon_from_latlons(min_lat, min_lon, max_lat, ...)

Creates a polygon from latitudes and longitudes.

create_line_from_latlons(lat1_lon1, lat2_lon2)

Creates a line between two points.

class mapreader.MapImages(path_images=None, file_ext=False, tree_level='parent', parent_path=None, **kwargs)

Class to manage a collection of image paths and construct image objects.

Parameters:
  • path_images (str or None, optional) – Path to the directory containing images (accepts wildcards). By default, False

  • file_ext (str or False, optional) – The file extension of the image files to be loaded, ignored if file types are specified in path_images (e.g. with "./path/to/dir/*png"). By default False.

  • tree_level (str, optional) – Level of the image hierarchy to construct. The value can be "parent" (default) and "patch".

  • parent_path (str, optional) – Path to parent images (if applicable), by default None.

  • **kwargs (dict, optional) – Additional keyword arguments to be passed to the _images_constructor method.

path_images

List of paths to the image files.

Type:

list

images

A dictionary containing the constructed image data. It has two levels of hierarchy, "parent" and "patch", depending on the value of the tree_level parameter.

Type:

dict

add_metadata(metadata, index_col=0, delimiter=',', columns=None, tree_level='parent', ignore_mismatch=False)

Add metadata information to the images dictionary.

Parameters:
  • metadata (str or pandas.DataFrame) – Path to a csv (or similar), xls or xlsx file or a pandas DataFrame that contains the metadata information.

  • index_col (int or str, optional) –

    Column to use as the index when reading the file and converting into a panda.DataFrame. Accepts column indices or column names. By default 0 (first column).

    Only used if a file path is provided as the metadata parameter. Ignored if columns parameter is passed.

  • delimiter (str, optional) –

    Delimiter used in the csv file, by default ",".

    Only used if a csv file path is provided as the metadata parameter.

  • columns (list, optional) – List of columns indices or names to add to MapImages. If None is passed, all columns will be used. By default None.

  • tree_level (str, optional) – Determines which images dictionary ("parent" or "patch") to add the metadata to, by default "parent".

  • ignore_mismatch (bool, optional) – Whether to error if metadata with mismatching information is passed. By default False.

Raises:

ValueError

If metadata is not a pandas DataFrame or a csv, xls or xlsx file path.

If ‘name’ or ‘image_id’ is not one of the columns in the metadata.

Return type:

None

Notes

Your metadata file must contain an column which contains the image IDs (filenames) of your images. This should have a column name of either name or image_id.

Existing information in your MapImages object will be overwritten if there are overlapping column headings in your metadata file/dataframe.

show_sample(num_samples, tree_level='patch', random_seed=65, **kwargs)

Display a sample of images from a particular level in the image hierarchy.

Parameters:
  • num_samples (int) – The number of images to display.

  • tree_level (str, optional) – The level of the hierarchy to display images from, which can be "patch" or "parent". By default “patch”.

  • random_seed (int, optional) – The random seed to use for reproducibility. Default is 65.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to matplotlib.pyplot.figure().

Returns:

The figure generated

Return type:

matplotlib.Figure

list_parents()

Return list of all parents

Return type:

list[str]

list_patches()

Return list of all patches

Return type:

list[str]

add_shape(tree_level='parent')

Add a shape to each image in the specified level of the image hierarchy.

Parameters:

tree_level (str, optional) – The level of the hierarchy to add shapes to, either "parent" (default) or "patch".

Return type:

None

Notes

The method runs mapreader.load.images.MapImages._add_shape_id() for each image present at the tree_level provided.

add_coords_from_grid_bb(verbose=False)
Parameters:

verbose (bool)

Return type:

None

add_coord_increments(verbose=False)

Adds coordinate increments to each image at the parent level.

Parameters:

verbose (bool, optional) – Whether to print verbose outputs, by default False.

Return type:

None

Notes

The method runs mapreader.load.images.MapImages._add_coord_increments_id() for each image present at the parent level, which calculates pixel-wise delta longitude (dlon) and delta latitude (dlat) for the image and adds the data to it.

add_patch_coords(verbose=False)

Add coordinates to all patches in patches dictionary.

Parameters:

verbose (bool, optional) – Whether to print verbose outputs. By default, False

Return type:

None

add_patch_polygons(verbose=False)

Add polygon to all patches in patches dictionary.

Parameters:

verbose (bool, optional) – Whether to print verbose outputs. By default, False

Return type:

None

add_center_coord(tree_level='patch', verbose=False)

Adds center coordinates to each image at the specified tree level.

Parameters:
  • tree_level (str, optional) – The tree level where the center coordinates will be added. It can be either "parent" or "patch" (default).

  • verbose (bool, optional) – Whether to print verbose outputs, by default False.

Return type:

None

Notes

The method runs mapreader.load.images.MapImages._add_center_coord_id() for each image present at the tree_level provided, which calculates central longitude and latitude (center_lon and center_lat) for the image and adds the data to it.

patchify_all(method='pixel', patch_size=100, tree_level='parent', path_save=None, add_to_parents=True, square_cuts=False, resize_factor=False, output_format='png', rewrite=False, verbose=False)

Patchify all images in the specified tree_level and (if add_to_parents=True) add the patches to the MapImages instance’s images dictionary.

Parameters:
  • method (str, optional) – Method used to patchify images, choices between "pixel" (default) and "meters" or "meter".

  • patch_size (int, optional) – Number of pixels/meters in both x and y to use for slicing, by default 100.

  • tree_level (str, optional) – Tree level, choices between "parent" or "patch, by default "parent".

  • path_save (str, optional) – Directory to save the patches. If None, will be set as f”patches_{patch_size}_{method}” (e.g. “patches_100_pixel”). By default None.

  • add_to_parents (bool, optional) – If True, patches will be added to the MapImages instance’s images dictionary, by default True.

  • square_cuts (bool, optional) – If True, all patches will have the same number of pixels in x and y, by default False.

  • resize_factor (bool, optional) – If True, resize the images before patchifying, by default False.

  • output_format (str, optional) – Format to use when writing image files, by default "png".

  • rewrite (bool, optional) – If True, existing patches will be rewritten, by default False.

  • verbose (bool, optional) – If True, progress updates will be printed throughout, by default False.

Return type:

None

calc_pixel_stats(parent_id=None, calc_mean=True, calc_std=True, verbose=False)

Calculate the mean and standard deviation of pixel values for all channels of all patches of a given parent image. Store the results in the MapImages instance’s images dictionary.

Parameters:
  • parent_id (str or None, optional) – The ID of the parent image to calculate pixel stats for. If None, calculate pixel stats for all parent images. By default, None

  • calc_mean (bool, optional) – Whether to calculate mean pixel values. By default, True.

  • calc_std (bool, optional) – Whether to calculate standard deviation of pixel values. By default, True.

  • verbose (bool, optional) – Whether to print verbose outputs. By default, False.

Return type:

None

Notes

  • Pixel stats are calculated for patches of the parent image specified by parent_id.

  • If parent_id is None, pixel stats are calculated for all parent images in the object.

  • If mean or standard deviation of pixel values has already been calculated for a patch, the calculation is skipped.

  • Pixel stats are stored in the images attribute of the MapImages instance, under the patch key for each patch.

  • If no patches are found for a parent image, a warning message is displayed and the method moves on to the next parent image.

convert_images(save=False, save_format='csv', delimiter=',')

Convert the MapImages instance’s images dictionary into pandas DataFrames for easy manipulation.

Parameters:
  • save (bool, optional) – Whether to save the dataframes as files. By default False.

  • save_format (str, optional) – If save = True, the file format to use when saving the dataframes. Options of csv (“csv”) or excel (“excel” or “xlsx”). By default, “csv”.

  • delimiter (str, optional) – The delimiter to use when saving the dataframe. By default ",".

Returns:

The method returns a tuple of two DataFrames: One for the parent images and one for the patch images.

Return type:

tuple of two pandas DataFrames

show_parent(parent_id, column_to_plot=None, **kwargs)

A wrapper method for .show() which plots all patches of a specified parent (parent_id).

Parameters:
  • parent_id (str) – ID of the parent image to be plotted.

  • column_to_plot (str, optional) – Column whose values will be plotted on patches, by default None.

  • **kwargs (Dict) – Key words to pass to show method. See help text for show for more information.

Returns:

A list of figures created by the method.

Return type:

list

Notes

This is a wrapper method. See the documentation of the mapreader.load.images.MapImages.show() method for more detail.

show(image_ids, column_to_plot=None, figsize=(10, 10), plot_parent=True, patch_border=True, border_color='r', vmin=None, vmax=None, alpha=1.0, cmap='viridis', discrete_cmap=256, plot_histogram=False, save_kml_dir=False, image_width_resolution=None, kml_dpi_image=None)

Plot images from a list of image_ids.

Parameters:
  • image_ids (str or list) – Image ID or list of image IDs to be plotted.

  • column_to_plot (str, optional) – Column whose values will be plotted on patches, by default None.

  • plot_parent (bool, optional) – If True, parent image will be plotted in background, by default True.

  • figsize (tuple, optional) – The size of the figure to be plotted. By default, (10,10).

  • patch_border (bool, optional) – If True, a border will be placed around each patch, by default True.

  • border_color (str, optional) – The color of the border. Default is "r".

  • vmin (float, optional) – The minimum value for the colormap. If None, will be set to minimum value in column_to_plot, by default None.

  • vmax (float, optional) – The maximum value for the colormap. If None, will be set to the maximum value in column_to_plot, by default None.

  • alpha (float, optional) – Transparency level for plotting value with floating point values ranging from 0.0 (transparent) to 1 (opaque), by default 1.0.

  • cmap (str, optional) – Color map used to visualize chosen column_to_plot values, by default "viridis".

  • discrete_cmap (int, optional) – Number of discrete colors to use in color map, by default 256.

  • plot_histogram (bool, optional) – If True, plot histograms of the value of images. By default False.

  • save_kml_dir (str or bool, optional) – If True, save KML files of the images. If a string is provided, it is the path to the directory in which to save the KML files. If set to False, no files are saved. By default False.

  • image_width_resolution (int or None, optional) –

    The pixel width to be used for plotting. If None, the resolution is not changed. Default is None.

    Note: Only relevant when tree_level="parent".

  • kml_dpi_image (int or None, optional) – The resolution, in dots per inch, to create KML images when save_kml_dir is specified (as either True or with path). By default None.

Returns:

A list of figures created by the method.

Return type:

list

load_patches(patch_paths, patch_file_ext=False, parent_paths=False, parent_file_ext=False, add_geo_info=False, clear_images=False)

Loads patch images from the given paths and adds them to the images dictionary in the MapImages instance.

Parameters:
  • patch_paths (str) –

    The file path of the patches to be loaded.

    Note: The ``patch_paths`` parameter accepts wildcards.

  • patch_file_ext (str or bool, optional) – The file extension of the patches to be loaded, ignored if file extensions are specified in patch_paths (e.g. with "./path/to/dir/*png") By default False.

  • parent_paths (str or bool, optional) –

    The file path of the parent images to be loaded. If set to False, no parents are loaded. Default is False.

    Note: The ``parent_paths`` parameter accepts wildcards.

  • parent_file_ext (str or bool, optional) – The file extension of the parent images, ignored if file extensions are specified in parent_paths (e.g. with "./path/to/dir/*png") By default False.

  • add_geo_info (bool, optional) – If True, adds geographic information to the parent image. Default is False.

  • clear_images (bool, optional) – If True, clears the images from the images dictionary before loading. Default is False.

Return type:

None

static detect_parent_id_from_path(image_id, parent_delimiter='#')

Detect parent IDs from image_id.

Parameters:
  • image_id (int or str) – ID of patch.

  • parent_delimiter (str, optional) – Delimiter used to separate parent ID when naming patch, by default "#".

Returns:

Parent ID.

Return type:

str

static detect_pixel_bounds_from_path(image_id)

Detects borders from the path assuming patch is named using the following format: ...-min_x-min_y-max_x-max_y-...

Parameters:
  • image_id (int or str) – ID of image

  • ..

    border_delimiterstr, optional

    Delimiter used to separate border values when naming patch image, by default "-".

Returns:

Border (min_x, min_y, max_x, max_y) of image

Return type:

tuple of min_x, min_y, max_x, max_y

load_parents(parent_paths=False, parent_ids=False, parent_file_ext=False, overwrite=False, add_geo_info=False)

Load parent images from file paths (parent_paths).

If parent_paths is not given, only parent_ids, no image path will be added to the images.

Parameters:
  • parent_paths (str or bool, optional) – Path to parent images, by default False.

  • parent_ids (list, str or bool, optional) – ID(s) of parent images. Ignored if parent_paths are specified. By default False.

  • parent_file_ext (str or bool, optional) – The file extension of the parent images, ignored if file extensions are specified in parent_paths (e.g. with "./path/to/dir/*png") By default False.

  • overwrite (bool, optional) – If True, current parents will be overwritten, by default False.

  • add_geo_info (bool, optional) – If True, geographical info will be added to parents, by default False.

Return type:

None

load_df(parent_df=None, patch_df=None, clear_images=True)

Create MapImages instance by loading data from pandas DataFrame(s).

Parameters:
  • parent_df (pandas.DataFrame, optional) – DataFrame containing parents or path to parents, by default None.

  • patch_df (pandas.DataFrame, optional) – DataFrame containing patches, by default None.

  • clear_images (bool, optional) – If True, clear images before reading the dataframes, by default True.

Return type:

None

load_csv(parent_path=None, patch_path=None, clear_images=False, index_col_patch=0, index_col_parent=0, delimiter=',')

Load CSV files containing information about parent and patches, and update the images attribute of the MapImages instance with the loaded data.

Parameters:
  • parent_path (str, optional) – Path to the CSV file containing parent image information.

  • patch_path (str, optional) – Path to the CSV file containing patch information.

  • clear_images (bool, optional) – If True, clear all previously loaded image information before loading new information. Default is False.

  • index_col_patch (int, optional) – Column to set as index for the patch DataFrame, by default 0.

  • index_col_parent (int, optional) – Column to set as index for the parent DataFrame, by default 0.

  • delimiter (str, optional) – The delimiter to use when reading the dataframe. By default ",".

Return type:

None

add_geo_info(target_crs='EPSG:4326', verbose=True)

Add coordinates (reprojected to EPSG:4326) to all parents images using image metadata.

Parameters:
  • target_crs (str, optional) – Projection to convert coordinates into, by default "EPSG:4326".

  • verbose (bool, optional) – Whether to print verbose output, by default True

Return type:

None

Notes

For each image in the parents dictionary, this method calls _add_geo_info_id and coordinates (if present) to the image in the parent dictionary.

save_parents_as_geotiffs(rewrite=False, verbose=False, crs=None)

Save all parents in MapImages instance as geotiffs.

Parameters:
  • rewrite (bool, optional) – Whether to rewrite files if they already exist, by default False

  • verbose (bool, optional) – Whether to print verbose outputs, by default False

  • crs (str, optional) – The CRS of the coordinates. If None, the method will first look for crs in the parents dictionary and use those. If crs cannot be found in the dictionary, the method will use “EPSG:4326”. By default None.

Return type:

None

save_patches_as_geotiffs(rewrite=False, verbose=False, crs=None)

Save all patches in MapImages instance as geotiffs.

Parameters:
  • rewrite (bool, optional) – Whether to rewrite files if they already exist, by default False

  • verbose (bool, optional) – Whether to print verbose outputs, by default False

  • crs (str, optional) – The CRS of the coordinates. If None, the method will first look for crs in the patches dictionary and use those. If crs cannot be found in the dictionary, the method will use “EPSG:4326”. By default None.

Return type:

None

save_patches_to_geojson(geojson_fname='patches.geojson', rewrite=False, crs=None)

Saves patches to a geojson file.

Parameters:
  • geojson_fname (Optional[str], optional) – The name of the geojson file, by default “patches.geojson”

  • rewrite (Optional[bool], optional) – Whether to overwrite an existing file, by default False.

  • crs (Optional[str], optional) – The CRS to use when writing the geojson. If None, the method will look for “crs” in the patches dictionary and, if found, will use that. Otherwise it will set the crs to the default value of “EPSG:4326”. By default None

Return type:

None

mapreader.loader(path_images=None, tree_level='parent', parent_path=None, **kwargs)

Creates a MapImages class to manage a collection of image paths and construct image objects.

Parameters:
  • path_images (str or None, optional) – Path to the directory containing images (accepts wildcards). By default, None

  • tree_level (str, optional) – Level of the image hierarchy to construct. The value can be "parent" (default) and "patch".

  • parent_path (str, optional) – Path to parent images (if applicable), by default None.

  • **kwargs (dict, optional) – Additional keyword arguments to be passed to the _images_constructor() method.

Returns:

The MapImages class which can manage a collection of image paths and construct image objects.

Return type:

MapImages

Notes

This is a wrapper method. See the documentation of the mapreader.load.images.MapImages class for more detail.

mapreader.load_patches(patch_paths, patch_file_ext=False, parent_paths=False, parent_file_ext=False, add_geo_info=False, clear_images=False)

Creates a MapImages class to manage a collection of image paths and construct image objects. Then loads patch images from the given paths and adds them to the images dictionary in the MapImages instance.

Parameters:
  • patch_paths (str) –

    The file path of the patches to be loaded.

    Note: The ``patch_paths`` parameter accepts wildcards.

  • patch_file_ext (str or bool, optional) – The file extension of the patches, ignored if file extensions are specified in patch_paths (e.g. with "./path/to/dir/*png") By default False.

  • parent_paths (str or bool, optional) –

    The file path of the parent images to be loaded. If set to False, no parents are loaded. Default is False.

    Note: The ``parent_paths`` parameter accepts wildcards.

  • parent_file_ext (str or bool, optional) – The file extension of the parent images, ignored if file extensions are specified in parent_paths (e.g. with "./path/to/dir/*png") By default False.

  • add_geo_info (bool, optional) – If True, adds geographic information to the parent image. Default is False.

  • clear_images (bool, optional) – If True, clears the images from the images dictionary before loading. Default is False.

Returns:

The MapImages class which can manage a collection of image paths and construct image objects.

Return type:

MapImages

Notes

This is a wrapper method. See the documentation of the mapreader.load.images.MapImages class for more detail.

This function in particular, also calls the mapreader.load.images.MapImages.loadPatches() method. Please see the documentation for that method for more information as well.

class mapreader.SheetDownloader(metadata_path, download_url)

A class to download map sheets using metadata.

Parameters:
  • metadata_path (str)

  • download_url (str | list)

get_polygons()

For each map in metadata, creates a polygon from map geometry and saves to features dictionary.

Return type:

None

get_grid_bb(zoom_level=14)

For each map in metadata, creates a grid bounding box from map polygons and saves to features dictionary.

Parameters:

zoom_level (int, optional) – The zoom level to use when creating the grid bounding box. Later used when downloading maps, by default 14.

Return type:

None

extract_wfs_id_nos()

For each map in metadata, extracts WFS ID numbers from WFS information and saves to features dictionary.

Return type:

None

extract_published_dates(date_col=None)

For each map in metadata, extracts publication date and saves to features dictionary.

Parameters:

date_col (str or list, optional) –

A key or list of keys which map to the metadata field containing the publication date. Multilayer keys should be passed as a list. e.g.:

  • ”key1” will extract self.features[i]["key1"]

  • [“key1”,”key2”] will search for self.features[i]["key1"]["key2"]

If None, [“properties”][“WFS_TITLE”] will be used as keys. Date will then be extracted by regex searching for “Published: XXX”. By default None.

Return type:

None

get_merged_polygon()

Creates a multipolygon representing all maps in metadata.

Return type:

None

get_minmax_latlon()

Prints minimum and maximum latitudes and longitudes of all maps in metadata.

Return type:

None

query_map_sheets_by_wfs_ids(wfs_ids, append=False, print=False)

Find map sheets by WFS ID numbers.

Parameters:
  • wfs_ids (Union[list, int]) – The WFS ID numbers of the maps to download.

  • append (bool, optional) – Whether to append to current query results list or, if False, start a new list. By default False

  • print (bool, optional) – Whether to print query results or not. By default False

Return type:

None

query_map_sheets_by_polygon(polygon, mode='within', append=False, print=False)

Find map sheets which are found within or intersecting with a defined polygon.

Parameters:
  • polygon (Polygon) – shapely Polygon

  • mode (str, optional) – The mode to use when finding maps. Options of "within", which returns all map sheets which are completely within the defined polygon, and "intersects"", which returns all map sheets which intersect/overlap with the defined polygon. By default “within”.

  • append (bool, optional) – Whether to append to current query results list or, if False, start a new list. By default False

  • print (bool, optional) – Whether to print query results or not. By default False

Return type:

None

Notes

Use create_polygon_from_latlons() to create polygon.

query_map_sheets_by_coordinates(coords, append=False, print=False)

Find maps sheets which contain a defined set of coordinates. Coordinates are (x,y).

Parameters:
  • coords (tuple) – Coordinates in (x,y) format.

  • append (bool, optional) – Whether to append to current query results list or, if False, start a new list. By default False

  • print (bool, optional) – Whether to print query results or not. By default False

Return type:

None

query_map_sheets_by_line(line, append=False, print=False)

Find maps sheets which intersect with a line.

Parameters:
  • line (LineString) – shapely LineString

  • append (bool, optional) – Whether to append to current query results list or, if False, start a new list. By default False

  • print (bool, optional) – Whether to print query results or not. By default False

Return type:

None

Notes

Use create_line_from_latlons() to create line.

query_map_sheets_by_string(string, keys=None, append=False, print=False)

Find map sheets by searching for a string in a chosen metadata field.

Parameters:
  • string (str) – The string to search for. Can be raw string and use regular expressions.

  • keys (str or list, optional) –

    A key or list of keys used to get the metadata field to search in.

    Key(s) will be passed to each features dictionary. Multilayer keys should be passed as a list. e.g. [“key1”,”key2”] will search for self.features[i]["key1"]["key2"].

    If None, will search in all metadata fields. By default None.

  • append (bool, optional) – Whether to append to current query results list or, if False, start a new list. By default False

  • print (bool, optional) – Whether to print query results or not. By default False

Return type:

None

Notes

string is case insensitive.

print_found_queries()

Prints query results.

Return type:

None

download_all_map_sheets(path_save='maps', metadata_fname='metadata.csv', overwrite=False, download_in_parallel=True, **kwargs)

Downloads all map sheets in metadata.

Parameters:
  • path_save (str, optional) – Path to save map sheets, by default “maps”

  • metadata_fname (str, optional) – Name to use for metadata file, by default “metadata.csv”

  • overwrite (bool, optional) – Whether to overwrite existing maps, by default False.

  • download_in_parallel (bool, optional) – Whether to download tiles in parallel, by default True.

  • **kwargs (dict, optional) – Keyword arguments to pass to the _download_map_sheets() method.

Return type:

None

download_map_sheets_by_wfs_ids(wfs_ids, path_save='maps', metadata_fname='metadata.csv', overwrite=False, download_in_parallel=True, **kwargs)

Downloads map sheets by WFS ID numbers.

Parameters:
  • wfs_ids (Union[list, int]) – The WFS ID numbers of the maps to download.

  • path_save (str, optional) – Path to save map sheets, by default “maps”

  • metadata_fname (str, optional) – Name to use for metadata file, by default “metadata.csv”

  • overwrite (bool, optional) – Whether to overwrite existing maps, by default False.

  • download_in_parallel (bool, optional) – Whether to download tiles in parallel, by default True.

  • **kwargs (dict, optional) – Keyword arguments to pass to the _download_map_sheets() method.

Return type:

None

download_map_sheets_by_polygon(polygon, path_save='maps', metadata_fname='metadata.csv', mode='within', overwrite=False, download_in_parallel=True, **kwargs)

Downloads any map sheets which are found within or intersecting with a defined polygon.

Parameters:
  • polygon (Polygon) – shapely Polygon

  • path_save (str, optional) – Path to save map sheets, by default “maps”

  • metadata_fname (str, optional) – Name to use for metadata file, by default “metadata.csv”

  • mode (str, optional) – The mode to use when finding maps. Options of "within", which returns all map sheets which are completely within the defined polygon, and "intersects"", which returns all map sheets which intersect/overlap with the defined polygon. By default “within”.

  • overwrite (bool, optional) – Whether to overwrite existing maps, by default False.

  • download_in_parallel (bool, optional) – Whether to download tiles in parallel, by default True.

  • **kwargs (dict, optional) – Keyword arguments to pass to the _download_map_sheets() method.

Return type:

None

Notes

Use create_polygon_from_latlons() to create polygon.

download_map_sheets_by_coordinates(coords, path_save='maps', metadata_fname='metadata.csv', overwrite=False, download_in_parallel=True, **kwargs)

Downloads any maps sheets which contain a defined set of coordinates. Coordinates are (x,y).

Parameters:
  • coords (tuple) – Coordinates in (x,y) format.

  • path_save (str, optional) – Path to save map sheets, by default “maps”

  • metadata_fname (str, optional) – Name to use for metadata file, by default “metadata.csv”

  • overwrite (bool, optional) – Whether to overwrite existing maps, by default False.

  • download_in_parallel (bool, optional) – Whether to download tiles in parallel, by default True.

  • **kwargs (dict, optional) – Keyword arguments to pass to the _download_map_sheets() method.

Return type:

None

download_map_sheets_by_line(line, path_save='maps', metadata_fname='metadata.csv', overwrite=False, download_in_parallel=True, **kwargs)

Downloads any maps sheets which intersect with a line.

Parameters:
  • line (LineString) – shapely LineString

  • path_save (str, optional) – Path to save map sheets, by default “maps”

  • metadata_fname (str, optional) – Name to use for metadata file, by default “metadata.csv”

  • overwrite (bool, optional) – Whether to overwrite existing maps, by default False

  • download_in_parallel (bool, optional) – Whether to download tiles in parallel, by default True.

  • **kwargs (dict, optional) – Keyword arguments to pass to the _download_map_sheets() method.

Return type:

None

Notes

Use create_line_from_latlons() to create line.

download_map_sheets_by_string(string, keys=None, path_save='maps', metadata_fname='metadata.csv', overwrite=False, download_in_parallel=True, **kwargs)

Download map sheets by searching for a string in a chosen metadata field.

Parameters:
  • string (str) – The string to search for. Can be raw string and use regular expressions.

  • keys (str or list, optional) –

    A key or list of keys used to get the metadata field to search in.

    Key(s) will be passed to each features dictionary. Multilayer keys should be passed as a list. e.g. [“key1”,”key2”] will search for self.features[i]["key1"]["key2"].

    If None, will search in all metadata fields. By default None.

  • path_save (str, optional) – Path to save map sheets, by default “maps”

  • metadata_fname (str, optional) – Name to use for metadata file, by default “metadata.csv”

  • overwrite (bool, optional) – Whether to overwrite existing maps, by default False.

  • download_in_parallel (bool, optional) – Whether to download tiles in parallel, by default True.

  • **kwargs (dict, optional) – Keyword arguments to pass to the _download_map_sheets() method.

Return type:

None

Notes

string is case insensitive.

download_map_sheets_by_queries(path_save='maps', metadata_fname='metadata.csv', overwrite=False, download_in_parallel=True, **kwargs)

Downloads map sheets saved as query results.

Parameters:
  • path_save (str, optional) – Path to save map sheets, by default “maps”

  • metadata_fname (str, optional) – Name to use for metadata file, by default “metadata.csv”

  • overwrite (bool, optional) – Whether to overwrite existing maps, by default False.

  • download_in_parallel (bool, optional) – Whether to download tiles in parallel, by default True.

  • **kwargs (dict, optional) – Keyword arguments to pass to the _download_map_sheets() method.

Return type:

None

hist_published_dates(**kwargs)

Plots a histogram of the publication dates of maps in metadata.

Parameters:

**kwargs (dict, optional) –

A dictionary containing keyword arguments to pass to plotting function. See matplotlib.pyplot.hist() for acceptable values.

e.g. **dict(fc='c', ec='k')

Return type:

None

Notes

bins and range already set when plotting so are invalid kwargs.

plot_features_on_map(features, map_extent=None, add_id=True)

Plots boundaries of map sheets on a map using cartopy library, (if available).

Parameters:
  • map_extent (Union[str, list, tuple, None], optional) –

    The extent of the underlying map to be plotted.

    If a tuple or list, must be of the format [lon_min, lon_max, lat_min, lat_max]. If a string, only "uk", "UK" or "United Kingdom" are accepted and will limit the map extent to the UK’s boundaries. If None, the map extent will be set automatically. By default None.

  • add_id (bool, optional) – Whether to add an ID (WFS ID number) to each map sheet, by default True.

  • features (list)

Return type:

None

plot_all_metadata_on_map(map_extent=None, add_id=True)

Plots boundaries of all map sheets in metadata on a map using cartopy library (if available).

Parameters:
  • map_extent (Union[str, list, tuple, None], optional) –

    The extent of the underlying map to be plotted.

    If a tuple or list, must be of the format [lon_min, lon_max, lat_min, lat_max]. If a string, only "uk", "UK" or "United Kingdom" are accepted and will limit the map extent to the UK’s boundaries. If None, the map extent will be set automatically. By default None.

  • add_id (bool, optional) – Whether to add an ID (WFS ID number) to each map sheet, by default True.

Return type:

None

plot_queries_on_map(map_extent=None, add_id=True)

Plots boundaries of query results on a map using cartopy library (if available).

Parameters:
  • map_extent (Union[str, list, tuple, None], optional) –

    The extent of the underlying map to be plotted.

    If a tuple or list, must be of the format [lon_min, lon_max, lat_min, lat_max]. If a string, only "uk", "UK" or "United Kingdom" are accepted and will limit the map extent to the UK’s boundaries. If None, the map extent will be set automatically. By default None.

  • add_id (bool, optional) – Whether to add an ID (WFS ID number) to each map sheet, by default True.

Return type:

None

class mapreader.Downloader(download_url)

A class to download maps (without using metadata)

Parameters:

download_url (str | list)

download_map_by_polygon(polygon, zoom_level=14, path_save='maps', overwrite=False, map_name=None)

Downloads a map contained within a polygon.

Parameters:
  • polygon (Polygon) – A polygon defining the boundaries of the map

  • zoom_level (int, optional) – The zoom level to use, by default 14

  • path_save (str, optional) – Path to save map sheets, by default “maps”

  • overwrite (bool, optional) – Whether to overwrite existing maps, by default False.

  • map_name (str, optional) – Name to use when saving the map, by default None

Return type:

None

mapreader.create_polygon_from_latlons(min_lat, min_lon, max_lat, max_lon)

Creates a polygon from latitudes and longitudes.

Parameters:
  • min_lat (float) – minimum latitude

  • min_lon (float) – minimum longitude

  • max_lat (float) – maximum latitude

  • max_lon (float) – maximum longitude

Returns:

shapely Polgyon

Return type:

Polygon

mapreader.create_line_from_latlons(lat1_lon1, lat2_lon2)

Creates a line between two points.

Parameters:
  • lat1_lon1 (tuple) – Tuple defining first point

  • lat2 (tuple) – Tuple defining second point

  • lat2_lon2 (tuple)

Returns:

shapely LineString

Return type:

LineString

class mapreader.AnnotationsLoader
load(annotations, delimiter=',', images_dir=None, remove_broken=True, ignore_broken=False, patch_paths_col='image_path', label_col='label', append=True, scramble_frame=False, reset_index=False)

Loads annotations from a csv file or dataframe and can be used to set the patch_paths_col and label_col attributes.

Parameters:
  • annotations (Union[str, pd.DataFrame]) – The annotations. Can either be the path to a csv file or a pandas.DataFrame.

  • delimiter (Optional[str], optional) – The delimiter to use when loading the csv file as a dataframe, by default “,”.

  • images_dir (Optional[str], optional) – The path to the directory in which patches are stored. This argument should be passed if image paths are different from the path saved in annotations dataframe/csv. If None, no updates will be made to the image paths in the annotations dataframe/csv. By default None.

  • remove_broken (Optional[bool], optional) – Whether to remove annotations with broken image paths. If False, annotations with broken paths will remain in annotations dataframe and may cause issues! By default True.

  • ignore_broken (Optional[bool], optional) – Whether to ignore broken image paths (only valid if remove_broken=False). If True, annotations with broken paths will remain in annotations dataframe and no error will be raised. This may cause issues! If False, annotations with broken paths will raise error. By default, False.

  • patch_paths_col (Optional[str], optional) – The name of the column containing the image paths, by default “image_path”.

  • label_col (Optional[str], optional) – The name of the column containing the image labels, by default “label”.

  • append (Optional[bool], optional) – Whether to append the annotations to a pre-existing annotations dataframe. If False, existing dataframe will be overwritten. By default True.

  • scramble_frame (Optional[bool], optional) – Whether to shuffle the rows of the dataframe, by default False.

  • reset_index (Optional[bool], optional) – Whether to reset the index of the dataframe (e.g. after shuffling), by default False.

Raises:

ValueError – If annotations is passed as something other than a string or pd.DataFrame.

show_patch(patch_id)

Display a patch and its label.

Parameters:

patch_id (str) – The image ID of the patch to show.

Return type:

None

print_unique_labels()

Prints unique labels

Raises:

ValueError – If no annotations are found.

Return type:

None

review_labels(label_to_review=None, chunks=8 * 3, num_cols=8, exclude_df=None, include_df=None, deduplicate_col='image_id')

Perform image review on annotations and update labels for a given label or all labels.

Parameters:
  • label_to_review (str, optional) – The target label to review. If not provided, all labels will be reviewed, by default None.

  • chunks (int, optional) – The number of images to display at a time, by default 24.

  • num_cols (int, optional) – The number of columns in the display, by default 8.

  • exclude_df (pandas.DataFrame, optional) – A DataFrame of images to exclude from review, by default None.

  • include_df (pandas.DataFrame, optional) – A DataFrame of images to include for review, by default None.

  • deduplicate_col (str, optional) – The column to use for deduplicating reviewed images, by default "image_id".

Return type:

None

Notes

This method reviews images with their corresponding labels and allows the user to change the label for each image.

Updated labels are saved in self.annotations and in a newly created self.reviewed DataFrame. If exclude_df is provided, images found in this df are skipped in the review process. If include_df is provided, only images found in this df are reviewed. The self.reviewed DataFrame is deduplicated based on the deduplicate_col.

show_sample(label_to_show, num_samples=9)

Show a random sample of images with the specified label (tar_label).

Parameters:
  • label_to_show (str, optional) – The label of the images to show.

  • num_sample (int, optional) – The number of images to show. If None, all images with the specified label will be shown. Default is 9.

  • num_samples (int | None)

Return type:

None

create_datasets(frac_train=0.7, frac_val=0.15, frac_test=0.15, random_state=1364, train_transform='train', val_transform='val', test_transform='test', context_datasets=False, context_df=None)

Splits the dataset into three subsets: training, validation, and test sets (DataFrames) and saves them as a dictionary in self.datasets.

Parameters:
  • frac_train (float, optional) – Fraction of the dataset to be used for training. By default 0.70.

  • frac_val (float, optional) – Fraction of the dataset to be used for validation. By default 0.15.

  • frac_test (float, optional) – Fraction of the dataset to be used for testing. By default 0.15.

  • random_state (int, optional) – Random seed to ensure reproducibility. The default is 1364.

  • train_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the training dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “train”.

  • val_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the validation dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “val”.

  • test_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the test dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “test”.

  • context_datasets (bool, optional) – Whether to create context datasets or not. By default False.

  • context_df (str or pandas.DataFrame, optional) – The dataframe containing all patches if using context datasets. Used to create context images. By default None.

Raises:

ValueError – If the sum of fractions of training, validation and test sets does not add up to 1.

Return type:

None

Notes

This method saves the split datasets as a dictionary in self.datasets.

Following fractional ratios provided by the user, where each subset is stratified by the values in a specific column (that is, each subset has the same relative frequency of the values in the column). It performs this splitting by running train_test_split() twice.

See PatchDataset for more information on transforms.

create_patch_datasets(train_transform, val_transform, test_transform, df_train, df_val, df_test)
create_patch_context_datasets(context_df, train_transform, val_transform, test_transform, df_train, df_val, df_test)
create_dataloaders(batch_size=16, sampler='default', shuffle=False, num_workers=0, **kwargs)

Creates a dictionary containing PyTorch dataloaders saves it to as self.dataloaders and returns it.

Parameters:
  • batch_size (int, optional) – The batch size to use for the dataloader. By default 16.

  • sampler (Sampler, str or None, optional) – The sampler to use when creating batches from the training dataset.

  • shuffle (bool, optional) – Whether to shuffle the dataset during training. By default False.

  • num_workers (int, optional) – The number of worker threads to use for loading data. By default 0.

  • **kwds – Additional keyword arguments to pass to PyTorch’s DataLoader constructor.

Returns:

Dictionary containing dataloaders.

Return type:

Dict

Notes

sampler will only be applied to the training dataset (datasets[“train”]).

class mapreader.PatchDataset(patch_df, transform, delimiter=',', patch_paths_col='image_path', label_col=None, label_index_col=None, image_mode='RGB')

Bases: torch.utils.data.Dataset

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader. Subclasses could also optionally implement __getitems__(), for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

Note

DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

Parameters:
  • patch_df (pandas.DataFrame | str)

  • transform (str | torchvision.transforms.Compose | Callable)

  • delimiter (str)

  • patch_paths_col (str | None)

  • label_col (str | None)

  • label_index_col (str | None)

  • image_mode (str | None)

return_orig_image(idx)

Return the original image associated with the given index.

Parameters:

idx (int or Tensor) – The index of the desired image, or a Tensor containing the index.

Returns:

The original image associated with the given index.

Return type:

PIL.Image.Image

Notes

This method returns the original image associated with the given index by loading the image file using the file path stored in the patch_paths_col column of the patch_df DataFrame at the given index. The loaded image is then converted to the format specified by the image_mode attribute of the object. The resulting PIL.Image.Image object is returned.

create_dataloaders(set_name='infer', batch_size=16, shuffle=False, num_workers=0, **kwargs)

Creates a dictionary containing a PyTorch dataloader.

Parameters:
  • set_name (str, optional) – The name to use for the dataloader.

  • batch_size (int, optional) – The batch size to use for the dataloader. By default 16.

  • shuffle (bool, optional) – Whether to shuffle the PatchDataset, by default False

  • num_workers (int, optional) – The number of worker threads to use for loading data. By default 0.

  • **kwargs – Additional keyword arguments to pass to PyTorch’s DataLoader constructor.

Returns:

Dictionary containing dataloaders.

Return type:

Dict

class mapreader.PatchContextDataset(patch_df, total_df, transform, delimiter=',', patch_paths_col='image_path', label_col=None, label_index_col=None, image_mode='RGB', context_dir='./maps/maps_context', create_context=False, parent_path='./maps')

Bases: PatchDataset

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader. Subclasses could also optionally implement __getitems__(), for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

Note

DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

Parameters:
  • patch_df (pandas.DataFrame | str)

  • total_df (pandas.DataFrame | str)

  • transform (str)

  • delimiter (str)

  • patch_paths_col (str | None)

  • label_col (str | None)

  • label_index_col (str | None)

  • image_mode (str | None)

  • context_dir (str | None)

  • create_context (bool)

  • parent_path (str | None)

save_context(processors=10, sleep_time=0.001, use_parhugin=True, overwrite=False)

Save context images for all patches in the patch_df.

Parameters:
  • processors (int, optional) – The number of required processors for the job, by default 10.

  • sleep_time (float, optional) – The time to wait between jobs, by default 0.001.

  • use_parhugin (bool, optional) – Whether to use Parhugin to parallelize the job, by default True.

  • overwrite (bool, optional) – Whether to overwrite existing parent files, by default False.

Return type:

None

Notes

Parhugin is a Python package for parallelizing computations across multiple CPU cores. The method uses Parhugin to parallelize the computation of saving parent patches to disk. When Parhugin is installed and use_parhugin is set to True, the method parallelizes the calling of the get_context_id method and its corresponding arguments. If Parhugin is not installed or use_parhugin is set to False, the method executes the loop over patch indices sequentially instead.

get_context_id(id, overwrite=False, save_context=False, return_image=True)

Save the parents of a specific patch to the specified location.

Parameters:
  • id – Index of the patch in the dataset.

  • overwrite (bool, optional) – Whether to overwrite the existing parent files. Default is False.

  • save_context (bool, optional) – Whether to save the context image. Default is False.

  • return_image (bool, optional) – Whether to return the context image. Default is True.

Raises:

ValueError – If the patch is not found in the dataset.

Return type:

None

plot_sample(idx)

Plot a sample patch and its corresponding context from the dataset.

Parameters:

idx (int) – The index of the sample to plot.

Returns:

Displays the plot of the sample patch and its corresponding context.

Return type:

None

Notes

This method plots a sample patch and its corresponding context side-by- side in a single figure with two subplots. The figure size is set to 10in x 5in, and the titles of the subplots are set to “Patch” and “Context”, respectively. The resulting figure is displayed using the matplotlib library (required).

class mapreader.ClassifierContainer(model, labels_map, dataloaders=None, device='default', input_size=(224, 224), is_inception=False, load_path=None, force_device=False, **kwargs)
Parameters:
  • model (str | torch.nn.Module | None)

  • labels_map (dict[int, str] | None)

  • dataloaders (dict[str, torch.utils.data.DataLoader] | None)

  • device (str | None)

  • input_size (int | None)

  • is_inception (bool)

  • load_path (str | None)

  • force_device (bool | None)

generate_layerwise_lrs(min_lr, max_lr, spacing='linspace')

Calculates layer-wise learning rates for a given set of model parameters.

Parameters:
  • min_lr (float) – The minimum learning rate to be used.

  • max_lr (float) – The maximum learning rate to be used.

  • spacing (str, optional) – The type of sequence to use for spacing the specified interval learning rates. Can be either "linspace" or "geomspace", where “linspace” uses evenly spaced learning rates over a specified interval and “geomspace” uses learning rates spaced evenly on a log scale (a geometric progression). By default "linspace".

Returns:

A list of dictionaries containing the parameters and learning rates for each layer.

Return type:

list of dicts

initialize_optimizer(optim_type='adam', params2optimize='default', optim_param_dict=None, add_optim=True)

Initializes an optimizer for the model and adds it to the classifier object.

Parameters:
  • optim_type (str, optional) – The type of optimizer to use. Can be set to "adam" (default), "adamw", or "sgd".

  • params2optimize (str or iterable, optional) – The parameters to optimize. If set to "default", all model parameters that require gradients will be optimized. Default is "default".

  • optim_param_dict (dict, optional) – The parameters to pass to the optimizer constructor as a dictionary, by default {"lr": 1e-3}.

  • add_optim (bool, optional) – If True, adds the optimizer to the classifier object, by default True.

Returns:

optimizer – The initialized optimizer. Only returned if add_optim is set to False.

Return type:

torch.optim.Optimizer

Notes

If add_optim is True, the optimizer will be added to object.

Note that the first argument of an optimizer is parameters to optimize, e.g. params2optimize = model_ft.parameters():

  • model_ft.parameters(): all parameters are being optimized

  • model_ft.fc.parameters(): only parameters of final layer are being optimized

Here, we use:

filter(lambda p: p.requires_grad, self.model.parameters())
add_optimizer(optimizer)

Add an optimizer to the classifier object.

Parameters:

optimizer (torch.optim.Optimizer) – The optimizer to add to the classifier object.

Return type:

None

initialize_scheduler(scheduler_type='steplr', scheduler_param_dict=None, add_scheduler=True)

Initializes a learning rate scheduler for the optimizer and adds it to the classifier object.

Parameters:
  • scheduler_type (str, optional) – The type of learning rate scheduler to use. Can be either "steplr" (default) or "onecyclelr".

  • scheduler_param_dict (dict, optional) – The parameters to pass to the scheduler constructor, by default {"step_size": 10, "gamma": 0.1}.

  • add_scheduler (bool, optional) – If True, adds the scheduler to the classifier object, by default True.

Raises:

ValueError – If the specified scheduler_type is not implemented.

Returns:

scheduler – The initialized learning rate scheduler. Only returned if add_scheduler is set to False.

Return type:

torch.optim.lr_scheduler._LRScheduler

add_scheduler(scheduler)

Add a scheduler to the classifier object.

Parameters:

scheduler (torch.optim.lr_scheduler._LRScheduler) – The scheduler to add to the classifier object.

Raises:

ValueError – If no optimizer has been set. Use initialize_optimizer or add_optimizer to set an optimizer first.

Return type:

None

add_criterion(criterion='cross entropy')

Add a loss criterion to the classifier object.

Parameters:

criterion (str or torch.nn.modules.loss._Loss) – The loss criterion to add to the classifier object. Accepted string values are “cross entropy” or “ce” (cross-entropy), “bce” (binary cross-entropy) and “mse” (mean squared error).

Returns:

The function only modifies the criterion attribute of the classifier and does not return anything.

Return type:

None

model_summary(input_size=None, trainable_col=False, **kwargs)

Print a summary of the model.

Parameters:
  • input_size (tuple or list, optional) – The size of the input data. If None, input size is taken from “train” dataloader (self.dataloaders["train"]).

  • trainable_col (bool, optional) – If True, adds a column showing which parameters are trainable. Defaults to False.

  • **kwargs (Dict) – Keyword arguments to pass to torchinfo.summary() (see https://github.com/TylerYep/torchinfo).

Return type:

None

Notes

Other ways to check params:

sum(p.numel() for p in myclassifier.model.parameters())
sum(p.numel() for p in myclassifier.model.parameters()
    if p.requires_grad)

And:

for name, param in self.model.named_parameters():
    n = name.split(".")[0].split("_")[0]
    print(name, param.requires_grad)
freeze_layers(layers_to_freeze=None)

Freezes the specified layers in the neural network by setting requires_grad attribute to False for their parameters.

Parameters:

layers_to_freeze (list of str, optional) – List of names of the layers to freeze. If a layer name ends with an asterisk ("*"), then all parameters whose name contains the layer name (excluding the asterisk) are frozen. Otherwise, only the parameters with an exact match to the layer name are frozen. By default, [].

Returns:

The function only modifies the requires_grad attribute of the specified parameters and does not return anything.

Return type:

None

Notes

Wildcards are accepted in the layers_to_freeze parameter.

unfreeze_layers(layers_to_unfreeze=None)

Unfreezes the specified layers in the neural network by setting requires_grad attribute to True for their parameters.

Parameters:

layers_to_unfreeze (list of str, optional) – List of names of the layers to unfreeze. If a layer name ends with an asterisk ("*"), then all parameters whose name contains the layer name (excluding the asterisk) are unfrozen. Otherwise, only the parameters with an exact match to the layer name are unfrozen. By default, [].

Returns:

The function only modifies the requires_grad attribute of the specified parameters and does not return anything.

Return type:

None

Notes

Wildcards are accepted in the layers_to_unfreeze parameter.

only_keep_layers(only_keep_layers_list=None)

Only keep the specified layers (only_keep_layers_list) for gradient computation during the backpropagation.

Parameters:

only_keep_layers_list (list, optional) – List of layer names to keep. All other layers will have their gradient computation turned off. Default is [].

Returns:

The function only modifies the requires_grad attribute of the specified parameters and does not return anything.

Return type:

None

inference(set_name='infer', verbose=False, print_info_batch_freq=5)

Run inference on a specified dataset (set_name).

Parameters:
  • set_name (str, optional) – The name of the dataset to run inference on, by default "infer".

  • verbose (bool, optional) – Whether to print verbose outputs, by default False.

  • print_info_batch_freq (int, optional) – The frequency of printouts, by default 5.

Return type:

None

Notes

This method calls the mapreader.train.classifier.classifier.train() method with the num_epochs set to 1 and all the other parameters specified in the function arguments.

train_component_summary()

Print a summary of the optimizer, criterion, and trainable model components.

Returns:

None

Return type:

None

train(phases=None, num_epochs=25, save_model_dir='models', verbose=False, tensorboard_path=None, tmp_file_save_freq=2, remove_after_load=True, print_info_batch_freq=5)

Train the model on the specified phases for a given number of epochs.

Wrapper function for mapreader.train.classifier.classifier.train_core() method to capture exceptions (KeyboardInterrupt is the only supported exception currently).

Parameters:
  • phases (list of str, optional) – The phases to run through during each training iteration. Default is ["train", "val"].

  • num_epochs (int, optional) – The number of epochs to train the model for. Default is 25.

  • save_model_dir (str or None, optional) – The directory to save the model in. Default is "models". If set to None, the model is not saved.

  • verbose (int, optional) – Whether to print verbose outputs, by default False.

  • tensorboard_path (str or None, optional) – The path to the directory to save TensorBoard logs in. If set to None, no TensorBoard logs are saved. Default is None.

  • tmp_file_save_freq (int, optional) – The frequency (in epochs) to save a temporary file of the model. Default is 2. If set to 0 or None, no temporary file is saved.

  • remove_after_load (bool, optional) – Whether to remove the temporary file after loading it. Default is True.

  • print_info_batch_freq (int, optional) – The frequency (in batches) to print training information. Default is 5. If set to 0 or None, no training information is printed.

Returns:

The function saves the model to the save_model_dir directory, and optionally to a temporary file. If interrupted with a KeyboardInterrupt, the function tries to load the temporary file. If no temporary file is found, it continues without loading.

Return type:

None

Notes

Refer to the documentation of mapreader.train.classifier.classifier.train_core() for more information.

train_core(phases=None, num_epochs=25, save_model_dir='models', verbose=False, tensorboard_path=None, tmp_file_save_freq=2, print_info_batch_freq=5)

Trains/fine-tunes a classifier for the specified number of epochs on the given phases using the specified hyperparameters.

Parameters:
  • phases (list of str, optional) – The phases to run through during each training iteration. Default is ["train", "val"].

  • num_epochs (int, optional) – The number of epochs to train the model for. Default is 25.

  • save_model_dir (str or None, optional) – The directory to save the model in. Default is "models". If set to None, the model is not saved.

  • verbose (bool, optional) – Whether to print verbose outputs, by default False.

  • tensorboard_path (str or None, optional) – The path to the directory to save TensorBoard logs in. If set to None, no TensorBoard logs are saved. Default is None.

  • tmp_file_save_freq (int, optional) – The frequency (in epochs) to save a temporary file of the model. Default is 2. If set to 0 or None, no temporary file is saved.

  • print_info_batch_freq (int, optional) – The frequency (in batches) to print training information. Default is 5. If set to 0 or None, no training information is printed.

Raises:
  • ValueError

    If the criterion is not set. Use the add_criterion method to set the criterion.

    If the optimizer is not set and the phase is “train”. Use the initialize_optimizer or add_optimizer method to set the optimizer.

  • KeyError – If the specified phase cannot be found in the keys of the object’s dataloaders dictionary property.

Return type:

None

calculate_add_metrics(y_true, y_pred, y_score, phase, epoch=-1, tboard_writer=None)

Calculate and add metrics to the classifier’s metrics dictionary.

Parameters:
  • y_true (array-like of shape (n_samples,)) – True binary labels or multiclass labels. Can be considered ground truth or (correct) target values.

  • y_pred (array-like of shape (n_samples,)) – Predicted binary labels or multiclass labels. The estimated targets as returned by a classifier.

  • y_score (array-like of shape (n_samples, n_classes)) – Predicted probabilities for each class. Only required when y_pred is not binary.

  • phase (str) – Name of the current phase, typically "train" or "val". See train function.

  • epoch (int, optional) – Current epoch number. Default is -1.

  • tboard_writer (object, optional) – TensorBoard SummaryWriter object to write the metrics. Default is None.

Return type:

None

Notes

This method uses both the sklearn.metrics.precision_recall_fscore_support and sklearn.metrics.roc_auc_score functions from scikit-learn to calculate the metrics for each average type ("micro", "macro" and "weighted"). The results are then added to the metrics dictionary. It also writes the metrics to the TensorBoard SummaryWriter, if tboard_writer is not None.

plot_metric(y_axis, y_label, legends, x_axis='epoch', x_label='epoch', colors=5 * ['k', 'tab:red'], styles=10 * ['-'], markers=10 * ['o'], figsize=(10, 5), plt_yrange=None, plt_xrange=None)

Plot the metrics of the classifier object.

Parameters:
  • y_axis (list of str) – A list of metric names to be plotted on the y-axis.

  • y_label (str) – The label for the y-axis.

  • legends (list of str) – The legend labels for each metric.

  • x_axis (str, optional) – The metric to be used as the x-axis. Can be "epoch" (default) or any other metric name present in the dataset.

  • x_label (str, optional) – The label for the x-axis. Defaults to "epoch".

  • colors (list of str, optional) – The colors to be used for the lines of each metric. It must be at least the same size as y_axis. Defaults to 5 * ["k", "tab:red"].

  • styles (list of str, optional) – The line styles to be used for the lines of each metric. It must be at least the same size as y_axis. Defaults to 10 * ["-"].

  • markers (list of str, optional) – The markers to be used for the lines of each metric. It must be at least the same size as y_axis. Defaults to 10 * ["o"].

  • figsize (tuple of int, optional) – The size of the figure in inches. Defaults to (10, 5).

  • plt_yrange (tuple of float, optional) – The range of values for the y-axis. Defaults to None.

  • plt_xrange (tuple of float, optional) – The range of values for the x-axis. Defaults to None.

Return type:

None

Notes

This function requires the matplotlib package.

show_sample(set_name='train', batch_number=1, print_batch_info=True, figsize=(15, 10))

Displays a sample of training or validation data in a grid format with their corresponding class labels.

Parameters:
  • set_name (str, optional) – Name of the dataset ("train"/"validation") to display the sample from, by default "train".

  • batch_number (int, optional) – Which batch to display, by default 1.

  • print_batch_info (bool, optional) – Whether to print information about the batch size, by default True.

  • figsize (tuple, optional) – Figure size (width, height) in inches, by default (15, 10).

Returns:

Displays the sample images with their corresponding class labels.

Return type:

None

Raises:

StopIteration – If the specified number of batches to display exceeds the total number of batches in the dataset.

Notes

This method uses the dataloader of the ImageClassifierData class and the torchvision.utils.make_grid function to display the sample data in a grid format. It also calls the _imshow method of the ImageClassifierData class to show the sample data.

print_batch_info(set_name='train')

Print information about a dataset’s batches, samples, and batch-size.

Parameters:

set_name (str, optional) – Name of the dataset to display batch information for (default is "train").

Return type:

None

show_inference_sample_results(label, num_samples=6, set_name='test', min_conf=None, max_conf=None, figsize=(15, 15))

Shows a sample of the results of the inference.

Parameters:
  • label (str, optional) – The label for which to display results.

  • num_samples (int, optional) – The number of sample results to display. Defaults to 6.

  • set_name (str, optional) – The name of the dataset split to use for inference. Defaults to "test".

  • min_conf (float, optional) – The minimum confidence score for a sample result to be displayed. Samples with lower confidence scores will be skipped. Defaults to None.

  • max_conf (float, optional) – The maximum confidence score for a sample result to be displayed. Samples with higher confidence scores will be skipped. Defaults to None.

  • figsize (tuple[int, int], optional) – Figure size (width, height) in inches, displaying the sample results. Defaults to (15, 15).

Return type:

None

save(save_path='default.obj', force=False)

Save the object to a file.

Parameters:
  • save_path (str, optional) – The path to the file to write. If the file already exists and force is not True, a FileExistsError is raised. Defaults to "default.obj".

  • force (bool, optional) – Whether to overwrite the file if it already exists. Defaults to False.

Raises:

FileExistsError – If the file already exists and force is not True.

Return type:

None

Notes

The object is saved in two parts. First, a serialized copy of the object’s dictionary is written to the specified file using the joblib.dump function. The object’s model attribute is excluded from this dictionary and saved separately using the torch.save function, with a filename derived from the original save_path.

save_predictions(set_name, save_path=None, delimiter=',')
Parameters:
  • set_name (str)

  • save_path (str | None)

  • delimiter (str)

load_dataset(dataset, set_name, batch_size=16, sampler=None, shuffle=False, num_workers=0, **kwargs)

Creates a DataLoader from a PatchDataset and adds it to the dataloaders dictionary.

Parameters:
  • dataset (PatchDataset) – The dataset to add

  • set_name (str) – The name to use for the dataset

  • batch_size (Optional[int], optional) – The batch size to use when creating the DataLoader, by default 16

  • sampler (Optional[Union[Sampler, None]], optional) – The sampler to use when creating the DataLoader, by default None

  • shuffle (Optional[bool], optional) – Whether to shuffle the PatchDataset, by default False

  • num_workers (Optional[int], optional) – The number of worker threads to use for loading data, by default 0.

Return type:

None

load(load_path, force_device=False)

This function loads the state of a class instance from a saved file using the joblib library. It also loads a PyTorch model from a separate file and maps it to the device used to load the class instance.

Parameters:
  • load_path (str) – Path to the saved file to load.

  • force_device (bool or str, optional) – Whether to force the use of a specific device, or the name of the device to use. If set to True, the default device is used. Defaults to False.

Raises:

FileNotFoundError – If the specified file does not exist.

Return type:

None

cprint(type_info, bc_color, text)

Print colored text with additional information.

Parameters:
  • type_info (str) – The type of message to display.

  • bc_color (str) – The color to use for the message text.

  • text (str) – The text to display.

Returns:

The colored message is displayed on the standard output stream.

Return type:

None

update_progress(progress, text='', barLength=30)

Update the progress bar.

Parameters:
  • progress (float or int) – The progress value to display, between 0 and 1. If an integer is provided, it will be converted to a float. If a value outside the range [0, 1] is provided, it will be clamped to the nearest valid value.

  • text (str, optional) – Additional text to display after the progress bar, defaults to "".

  • barLength (int, optional) – The length of the progress bar in characters, defaults to 30.

Raises:

TypeError – If progress is not a floating point value or an integer.

Returns:

The progress bar is displayed on the standard output stream.

Return type:

None

class mapreader.Annotator(patch_df=None, parent_df=None, labels=None, patch_paths=None, parent_paths=None, metadata_path=None, annotations_dir='./annotations', patch_paths_col='image_path', label_col='label', show_context=False, auto_save=True, delimiter=',', sortby=None, ascending=True, username=None, task_name=None, min_values=None, max_values=None, filter_for=None, surrounding=1, max_size=1000, resize_to=None)

Bases: pandas.DataFrame

Annotator class for annotating patches with labels.

Parameters:
  • patch_df (str or pd.DataFrame or None, optional) – Path to a CSV file or a pandas DataFrame containing patch data, by default None

  • parent_df (str or pd.DataFrame or None, optional) – Path to a CSV file or a pandas DataFrame containing parent data, by default None

  • labels (list, optional) – List of labels for annotation, by default None

  • patch_paths (str or None, optional) – Path to patch images, by default None Ignored if patch_df is provided.

  • parent_paths (str or None, optional) – Path to parent images, by default None Ignored if parent_df is provided.

  • metadata_path (str or None, optional) – Path to metadata CSV file, by default None

  • annotations_dir (str, optional) – Directory to store annotations, by default “./annotations”

  • patch_paths_col (str, optional) – Name of the column in which image paths are stored in patch DataFrame, by default “image_path”

  • label_col (str, optional) – Name of the column in which labels are stored in patch DataFrame, by default “label”

  • show_context (bool, optional) – Whether to show context when loading patches, by default False

  • auto_save (bool, optional) – Whether to automatically save annotations, by default True

  • delimiter (str, optional) – Delimiter used in CSV files, by default “,”

  • sortby (str or None, optional) – Name of the column to use to sort the patch DataFrame, by default None. Default sort order is ascending=True. Pass ascending=False keyword argument to sort in descending order.

  • ascending (bool, optional) – Whether to sort the DataFrame in ascending order when using the sortby argument, by default True.

  • username (str or None, optional) – Username to use when saving annotations file, by default None. If not provided, a random string is generated.

  • task_name (str or None, optional) – Name of the annotation task, by default None.

  • min_values (dict, optional) – A dictionary consisting of column names (keys) and minimum values as floating point values (values), by default None.

  • max_values (dict, optional) – A dictionary consisting of column names (keys) and maximum values as floating point values (values), by default None.

  • filter_for (dict, optional) – A dictionary consisting of column names (keys) and values to filter for (values), by default None.

  • surrounding (int, optional) – The number of surrounding images to show for context, by default 1.

  • max_size (int, optional) – The size in pixels for the longest side to which constrain each patch image, by default 1000.

  • resize_to (int or None, optional) – The size in pixels for the longest side to which resize each patch image, by default None.

Raises:
  • FileNotFoundError – If the provided patch_df or parent_df file path does not exist

  • ValueError – If patch_df or parent_df is not a valid path to a CSV file or a pandas DataFrame If patch_df or patch_paths is not provided If the DataFrame does not have the required columns If sortby is not a string or None If labels provided are not in the form of a list

  • SyntaxError – If labels provided are not in the form of a list

property filtered: pandas.DataFrame
Return type:

pandas.DataFrame

get_queue(as_type='list')

Gets the indices of rows which are eligible for annotation.

Parameters:

as_type (str, optional) – The format in which to return the indices. Options: “list”, “index”. Default is “list”. If any other value is provided, it returns a pandas.Series.

Returns:

Depending on “as_type”, returns either a list of indices, a pd.Index object, or a pd.Series of legible rows.

Return type:

List[int] or pandas.Index or pandas.Series

get_context()

Provides the surrounding context for the patch to be annotated.

Returns:

An IPython VBox widget containing the surrounding patches for context.

Return type:

ipywidgets.VBox

annotate(show_context=None, sortby=None, ascending=None, min_values=None, max_values=None, surrounding=None, resize_to=None, max_size=None)

Annotate at the patch-level of the current patch. Renders the annotation interface for the first image.

Parameters:
  • show_context (bool or None, optional) – Whether or not to display the surrounding context for each image. Default is None.

  • sortby (str or None, optional) – Name of the column to use to sort the patch DataFrame, by default None. Default sort order is ascending=True. Pass ascending=False keyword argument to sort in descending order.

  • ascending (bool, optional) – Whether to sort the DataFrame in ascending order when using the sortby argument, by default True.

  • min_values (dict or None, optional) – Minimum values for each property to filter images for annotation. It should be provided as a dictionary consisting of column names (keys) and minimum values as floating point values (values). Default is None.

  • max_values (dict or None, optional) – Maximum values for each property to filter images for annotation. It should be provided as a dictionary consisting of column names (keys) and minimum values as floating point values (values). Default is None

  • surrounding (int or None, optional) – The number of surrounding images to show for context. Default: 1.

  • max_size (int or None, optional) – The size in pixels for the longest side to which constrain each patch image. Default: 100.

  • resize_to (int | None)

Return type:

None

Notes

This method is a wrapper for the _annotate method.

render()

Displays the image at the current index in the annotation interface.

If the current index is greater than or equal to the length of the dataframe, the method disables the “next” button and saves the data.

Return type:

None

get_patch_image(ix)

Returns the image at the given index.

Parameters:

ix (int | str) – The index of the image in the dataframe.

Returns:

A PIL.Image object of the image at the given index.

Return type:

PIL.Image

get_labelled_data(sort=True, index_labels=False, include_paths=True)

Returns the annotations made so far.

Parameters:
  • sort (bool, optional) – Whether to sort the dataframe by the order of the images in the input data, by default True

  • index_labels (bool, optional) – Whether to return the label’s index number (in the labels list provided in setting up the instance) or the human-readable label for each row, by default False

  • include_paths (bool, optional) – Whether to return a column containing the full path to the annotated image or not, by default True

Returns:

A dataframe containing the labelled images and their associated label index.

Return type:

pandas.DataFrame

render_complete()

Renders the completion message once all images have been annotated.

Return type:

None