mapreader
Subpackages
Classes
A class to download map sheets using metadata. |
|
A class to download maps (without using metadata) |
|
Class to manage a collection of image paths and construct image objects. |
|
Annotator class for annotating patches with labels. |
|
A class for loading annotations and preparing datasets and dataloaders for |
|
A PyTorch Dataset class for loading image patches from a DataFrame. |
|
A PyTorch Dataset class for loading contextual information about image |
|
A class to store and train a PyTorch model. |
|
A class for post-processing predictions on patches using the surrounding context. |
|
A class for carrying out occlusion analysis on patches. |
Functions
|
Creates a polygon from latitudes and longitudes. |
|
Creates a line between two points. |
|
Creates a |
|
Creates a |
Print the current version of mapreader. |
Package Contents
- class mapreader.SheetDownloader(metadata_path, download_url)
A class to download map sheets using metadata.
- Parameters:
metadata_path (str)
download_url (str | list)
- found_queries
- merged_polygon = None
- tile_server
- crs
- get_grid_bb(zoom_level=14)
Creates a grid bounding box for each map in metadata.
- Parameters:
zoom_level (int, optional) – The zoom level to use when creating the grid bounding box. Later used when downloading maps, by default 14.
- Return type:
None
- extract_wfs_id_nos()
Extracts WFS ID numbers from metadata.
- Return type:
None
- extract_published_dates(date_col=None)
Extracts publication dates from metadata.
- Parameters:
date_col (str or None, optional) – A string indicating the metadata column containing the publication date. If None, “WFS_TITLE” will be used. Date will then be extracted by regex searching for “Published: XXX”. By default None.
- Return type:
None
- get_merged_polygon()
Creates a multipolygon representing all maps in metadata.
- Return type:
None
- get_minmax_latlon()
Prints minimum and maximum latitudes and longitudes of all maps in metadata.
- Return type:
None
- query_map_sheets_by_wfs_ids(wfs_ids, append=False, print=False)
Find map sheets by WFS ID numbers.
- Parameters:
wfs_ids (Union[list, int]) – The WFS ID numbers of the maps to download.
append (bool, optional) – Whether to append to current query results or start over. By default False
print (bool, optional) – Whether to print query results or not. By default False
- Return type:
None
- query_map_sheets_by_polygon(polygon, mode='within', append=False, print=False)
Find map sheets which are found within or intersecting with a defined polygon.
- Parameters:
polygon (Polygon) – shapely Polygon
mode (str, optional) – The mode to use when finding maps. Options of
"within"
, which returns all map sheets which are completely within the defined polygon, and"intersects""
, which returns all map sheets which intersect/overlap with the defined polygon. By default “within”.append (bool, optional) – Whether to append to current query results or start over. By default False
print (bool, optional) – Whether to print query results or not. By default False
- Return type:
None
Notes
Use
create_polygon_from_latlons()
to create polygon.
- query_map_sheets_by_coordinates(coords, append=False, print=False)
Find maps sheets which contain a defined set of coordinates. Coordinates are (x,y).
- Parameters:
coords (tuple) – Coordinates in
(x,y)
format.append (bool, optional) – Whether to append to current query results or start over. By default False
print (bool, optional) – Whether to print query results or not. By default False
- Return type:
None
- query_map_sheets_by_line(line, append=False, print=False)
Find maps sheets which intersect with a line.
- Parameters:
line (LineString) – shapely LineString
append (bool, optional) – Whether to append to current query results or start over. By default False
print (bool, optional) – Whether to print query results or not. By default False
- Return type:
None
Notes
Use
create_line_from_latlons()
to create line.
- query_map_sheets_by_string(string, columns=None, append=False, print=False)
Find map sheets by searching for a string in the metadata.
- Parameters:
string (str) – The string to search for. Can be raw string and use regular expressions.
columns (str or list, optional) – A column or list of columns to search in. If
None
, will search in all metadata fields.append (bool, optional) – Whether to append to current query results or start over. By default False
print (bool, optional) – Whether to print query results or not. By default False
- Return type:
None
Notes
string
is case insensitive.
- print_found_queries()
Prints query results.
- Return type:
None
- download_all_map_sheets(path_save='maps', metadata_fname='metadata.csv', overwrite=False, download_in_parallel=True, **kwargs)
Downloads all map sheets in metadata.
- Parameters:
path_save (str, optional) – Path to save map sheets, by default “maps”
metadata_fname (str, optional) – Name to use for metadata file, by default “metadata.csv”
overwrite (bool, optional) – Whether to overwrite existing maps, by default
False
.download_in_parallel (bool, optional) – Whether to download tiles in parallel, by default
True
.**kwargs (dict, optional) – Keyword arguments to pass to the
_download_map_sheets()
method.
- Return type:
None
- download_map_sheets_by_wfs_ids(wfs_ids, path_save='maps', metadata_fname='metadata.csv', overwrite=False, download_in_parallel=True, **kwargs)
Downloads map sheets by WFS ID numbers.
- Parameters:
wfs_ids (Union[list, int]) – The WFS ID numbers of the maps to download.
path_save (str, optional) – Path to save map sheets, by default “maps”
metadata_fname (str, optional) – Name to use for metadata file, by default “metadata.csv”
overwrite (bool, optional) – Whether to overwrite existing maps, by default
False
.download_in_parallel (bool, optional) – Whether to download tiles in parallel, by default
True
.**kwargs (dict, optional) – Keyword arguments to pass to the
_download_map_sheets()
method.
- Return type:
None
- download_map_sheets_by_polygon(polygon, path_save='maps', metadata_fname='metadata.csv', mode='within', overwrite=False, download_in_parallel=True, **kwargs)
Downloads any map sheets which are found within or intersecting with a defined polygon.
- Parameters:
polygon (Polygon) – shapely Polygon
path_save (str, optional) – Path to save map sheets, by default “maps”
metadata_fname (str, optional) – Name to use for metadata file, by default “metadata.csv”
mode (str, optional) – The mode to use when finding maps. Options of
"within"
, which returns all map sheets which are completely within the defined polygon, and"intersects""
, which returns all map sheets which intersect/overlap with the defined polygon. By default “within”.overwrite (bool, optional) – Whether to overwrite existing maps, by default
False
.download_in_parallel (bool, optional) – Whether to download tiles in parallel, by default
True
.**kwargs (dict, optional) – Keyword arguments to pass to the
_download_map_sheets()
method.
- Return type:
None
Notes
Use
create_polygon_from_latlons()
to create polygon.
- download_map_sheets_by_coordinates(coords, path_save='maps', metadata_fname='metadata.csv', overwrite=False, download_in_parallel=True, **kwargs)
Downloads any maps sheets which contain a defined set of coordinates. Coordinates are (x,y).
- Parameters:
coords (tuple) – Coordinates in
(x,y)
format.path_save (str, optional) – Path to save map sheets, by default “maps”
metadata_fname (str, optional) – Name to use for metadata file, by default “metadata.csv”
overwrite (bool, optional) – Whether to overwrite existing maps, by default
False
.download_in_parallel (bool, optional) – Whether to download tiles in parallel, by default
True
.**kwargs (dict, optional) – Keyword arguments to pass to the
_download_map_sheets()
method.
- Return type:
None
- download_map_sheets_by_line(line, path_save='maps', metadata_fname='metadata.csv', overwrite=False, download_in_parallel=True, **kwargs)
Downloads any maps sheets which intersect with a line.
- Parameters:
line (LineString) – shapely LineString
path_save (str, optional) – Path to save map sheets, by default “maps”
metadata_fname (str, optional) – Name to use for metadata file, by default “metadata.csv”
overwrite (bool, optional) – Whether to overwrite existing maps, by default
False
download_in_parallel (bool, optional) – Whether to download tiles in parallel, by default
True
.**kwargs (dict, optional) – Keyword arguments to pass to the
_download_map_sheets()
method.
- Return type:
None
Notes
Use
create_line_from_latlons()
to create line.
- download_map_sheets_by_string(string, columns=None, path_save='maps', metadata_fname='metadata.csv', overwrite=False, download_in_parallel=True, **kwargs)
Download map sheets by searching for a string in the metadata.
- Parameters:
string (str) – The string to search for. Can be raw string and use regular expressions.
columns (str or list or None, optional) – A column or list of columns to search in. If
None
, will search in all metadata fields. By defaultNone
.path_save (str, optional) – Path to save map sheets, by default “maps”
metadata_fname (str, optional) – Name to use for metadata file, by default “metadata.csv”
overwrite (bool, optional) – Whether to overwrite existing maps, by default
False
.download_in_parallel (bool, optional) – Whether to download tiles in parallel, by default
True
.**kwargs (dict, optional) – Keyword arguments to pass to the
_download_map_sheets()
method.
- Return type:
None
Notes
string
is case insensitive.
- download_map_sheets_by_queries(path_save='maps', metadata_fname='metadata.csv', overwrite=False, download_in_parallel=True, **kwargs)
Downloads map sheets saved as query results.
- Parameters:
path_save (str, optional) – Path to save map sheets, by default “maps”
metadata_fname (str, optional) – Name to use for metadata file, by default “metadata.csv”
overwrite (bool, optional) – Whether to overwrite existing maps, by default
False
.download_in_parallel (bool, optional) – Whether to download tiles in parallel, by default
True
.**kwargs (dict, optional) – Keyword arguments to pass to the
_download_map_sheets()
method.
- Return type:
None
- plot_features_on_map(features, map_extent=None, add_id=True)
Plot boundaries of map sheets on a map using cartopy.
- Parameters:
map_extent (Union[str, list, tuple, None], optional) –
The extent of the underlying map to be plotted.
If a tuple or list, must be of the format
[lon_min, lon_max, lat_min, lat_max]
. If a string, only"uk"
,"UK"
or"United Kingdom"
are accepted and will limit the map extent to the UK’s boundaries. If None, the map extent will be set automatically. By default None.add_id (bool, optional) – Whether to add an ID (WFS ID number) to each map sheet, by default True.
features (geopandas.GeoDataFrame)
- Return type:
None
- plot_all_metadata_on_map(map_extent=None, add_id=True)
Plots boundaries of all map sheets in metadata on a map using cartopy.
- Parameters:
map_extent (Union[str, list, tuple, None], optional) –
The extent of the underlying map to be plotted.
If a tuple or list, must be of the format
[lon_min, lon_max, lat_min, lat_max]
. If a string, only"uk"
,"UK"
or"United Kingdom"
are accepted and will limit the map extent to the UK’s boundaries. If None, the map extent will be set automatically. By default None.add_id (bool, optional) – Whether to add an ID (WFS ID number) to each map sheet, by default True.
- Return type:
None
- plot_queries_on_map(map_extent=None, add_id=True)
Plots boundaries of query results on a map using cartopy.
- Parameters:
map_extent (Union[str, list, tuple, None], optional) –
The extent of the underlying map to be plotted.
If a tuple or list, must be of the format
[lon_min, lon_max, lat_min, lat_max]
. If a string, only"uk"
,"UK"
or"United Kingdom"
are accepted and will limit the map extent to the UK’s boundaries. If None, the map extent will be set automatically. By default None.add_id (bool, optional) – Whether to add an ID (WFS ID number) to each map sheet, by default True.
- Return type:
None
- class mapreader.Downloader(download_url)
A class to download maps (without using metadata)
- Parameters:
download_url (str | list)
- download_url
- download_map_by_polygon(polygon, zoom_level=14, path_save='maps', overwrite=False, map_name=None, **kwargs)
Downloads a map contained within a polygon.
- Parameters:
polygon (Polygon) – A polygon defining the boundaries of the map
zoom_level (int, optional) – The zoom level to use, by default 14
path_save (str, optional) – Path to save map sheets, by default “maps”
overwrite (bool, optional) – Whether to overwrite existing maps, by default
False
.map_name (str, optional) – Name to use when saving the map, by default None
**kwargs (dict) – Additional keyword arguments to pass to the _download_map method
- Return type:
None
- mapreader.create_polygon_from_latlons(min_lat, min_lon, max_lat, max_lon)
Creates a polygon from latitudes and longitudes.
- Parameters:
min_lat (float) – minimum latitude
min_lon (float) – minimum longitude
max_lat (float) – maximum latitude
max_lon (float) – maximum longitude
- Returns:
shapely Polgyon
- Return type:
Polygon
- mapreader.create_line_from_latlons(lat1_lon1, lat2_lon2)
Creates a line between two points.
- Parameters:
lat1_lon1 (tuple) – Tuple defining first point
lat2 (tuple) – Tuple defining second point
lat2_lon2 (tuple)
- Returns:
shapely LineString
- Return type:
LineString
- class mapreader.MapImages(path_images=None, file_ext=None, tree_level='parent', parent_path=None, **kwargs)
Class to manage a collection of image paths and construct image objects.
- Parameters:
path_images (str or None, optional) – Path to the directory containing images (accepts wildcards). By default,
None
(no images will be loaded).file_ext (str or None, optional) – The file extension of the image files to be loaded, ignored if file types are specified in
path_images
(e.g. with"./path/to/dir/*png"
). By defaultNone
.tree_level (str, optional) – Level of the image hierarchy to construct. The value can be
"parent"
(default) and"patch"
.parent_path (str or None, optional) – Path to parent images (if applicable), by default
None
.**kwargs (dict, optional) – Keyword arguments to pass to the
_images_constructor()
method.
- path_images
List of paths to the image files.
- Type:
list
- images
A dictionary containing the constructed image data. It has two levels of hierarchy,
"parent"
and"patch"
, depending on the value of thetree_level
parameter.- Type:
dict
- images
- parents
- patches
- georeferenced = False
- check_georeferencing()
- add_metadata(metadata, index_col=0, delimiter=',', usecols=None, tree_level='parent', ignore_mismatch=False)
Add metadata information to the
images
dictionary property.- Parameters:
metadata (str or pathlib.Path or pandas.DataFrame or geopandas.GeoDataFrame) – Path to a CSV/TSV/etc, Excel or JSON/GeoJSON file or a pandas DataFrame or geopandas GeoDataFrame.
index_col (int or str, optional) –
Column to use as the index when reading the file and converting into a panda.DataFrame. Accepts column indices or column names. By default
0
(first column).Only used if a CSV/TSV file path is provided as the
metadata
parameter. Ignored ifusecols
parameter is passed.delimiter (str, optional) –
Delimiter used in the
csv
file, by default","
.Only used if a
csv
file path is provided as themetadata
parameter.usecols (list, optional) – List of columns indices or names to add to MapImages. If
None
is passed, all columns will be used. By defaultNone
.tree_level (str, optional) – Determines which images dictionary (
"parent"
or"patch"
) to add the metadata to, by default"parent"
.ignore_mismatch (bool, optional) – Whether to error if metadata with mismatching information is passed. By default
False
.
- Raises:
ValueError –
If metadata is not a valid file path, pandas DataFrame or geopandas GeoDataFrame.
If ‘name’ or ‘image_id’ is not one of the columns in the metadata.
- Return type:
None
Notes
Your metadata file must contain an column which contains the image IDs (filenames) of your images. This should have a column name of either
name
orimage_id
.Existing information in your
MapImages
object will be overwritten if there are overlapping column headings in your metadata file/dataframe.
- show_sample(num_samples, tree_level='patch', random_seed=65, **kwargs)
Display a sample of images from a particular level in the image hierarchy.
- Parameters:
num_samples (int) – The number of images to display.
tree_level (str, optional) – The level of the hierarchy to display images from, which can be
"patch"
or"parent"
. By default “patch”.random_seed (int, optional) – The random seed to use for reproducibility. Default is
65
.**kwargs (dict, optional) – Additional keyword arguments to pass to
matplotlib.pyplot.figure()
.
- Returns:
The figure generated
- Return type:
matplotlib.Figure
- list_parents()
Return list of all parents
- Return type:
list[str]
- list_patches()
Return list of all patches
- Return type:
list[str]
- add_shape(tree_level='parent')
Add a shape to each image in the specified level of the image hierarchy.
- Parameters:
tree_level (str, optional) – The level of the hierarchy to add shapes to, either
"parent"
(default) or"patch"
.- Return type:
None
Notes
The method runs
_add_shape_id()
for each image present at thetree_level
provided.
- add_coords_from_grid_bb(verbose=False)
- Parameters:
verbose (bool)
- Return type:
None
- add_coord_increments(verbose=False)
Adds coordinate increments to each image at the parent level.
- Parameters:
verbose (bool, optional) – Whether to print verbose outputs, by default
False
.- Return type:
None
Notes
The method runs
_add_coord_increments_id()
for each image present at the parent level, which calculates pixel-wise delta longitude (dlon
) and delta latitude (dlat
) for the image and adds the data to it.
- add_patch_coords(verbose=False)
Add coordinates to all patches in patches dictionary.
- Parameters:
verbose (bool, optional) – Whether to print verbose outputs. By default,
False
- Return type:
None
- add_patch_polygons(verbose=False)
Add polygon to all patches in patches dictionary.
- Parameters:
verbose (bool, optional) – Whether to print verbose outputs. By default,
False
- Return type:
None
- add_center_coord(tree_level='patch', verbose=False)
Adds center coordinates to each image at the specified tree level.
- Parameters:
tree_level (str, optional) – The tree level where the center coordinates will be added. It can be either
"parent"
or"patch"
(default).verbose (bool, optional) – Whether to print verbose outputs, by default
False
.
- Return type:
None
Notes
The method runs
_add_center_coord_id()
for each image present at thetree_level
provided, which calculates central longitude and latitude (center_lon
andcenter_lat
) for the image and adds the data to it.
- add_parent_polygons(verbose=False)
Add polygon to all parents in parents dictionary.
- Parameters:
verbose (bool, optional) – Whether to print verbose outputs. By default,
False
- Return type:
None
- patchify_all(method='pixel', patch_size=100, tree_level='parent', path_save=None, add_to_parents=True, square_cuts=False, resize_factor=False, output_format='png', rewrite=False, verbose=False, overlap=0)
Patchify all images in the specified
tree_level
and (ifadd_to_parents=True
) add the patches to the MapImages instance’simages
dictionary.- Parameters:
method (str, optional) – Method used to patchify images, choices between
"pixel"
(default) and"meters"
or"meter"
.patch_size (int, optional) – Number of pixels/meters in both x and y to use for slicing, by default
100
.tree_level (str, optional) – Tree level, choices between
"parent"
or"patch
, by default"parent"
.path_save (str, optional) – Directory to save the patches. If None, will be set as f”patches_{patch_size}_{method}” (e.g. “patches_100_pixel”). By default None.
add_to_parents (bool, optional) – If True, patches will be added to the MapImages instance’s
images
dictionary, by defaultTrue
.square_cuts (bool, optional) – If True, all patches will have the same number of pixels in x and y, by default
False
.resize_factor (bool, optional) – If True, resize the images before patchifying, by default
False
.output_format (str, optional) – Format to use when writing image files, by default
"png"
.rewrite (bool, optional) – If True, existing patches will be rewritten, by default
False
.verbose (bool, optional) – If True, progress updates will be printed throughout, by default
False
.overlap (int, optional) – Fractional overlap between patches, by default
0
.
- Return type:
None
- calc_pixel_stats(parent_id=None, calc_mean=True, calc_std=True, verbose=False)
Calculate the mean and standard deviation of pixel values for all channels of all patches of a given parent image. Store the results in the MapImages instance’s
images
dictionary.- Parameters:
parent_id (str or None, optional) – The ID of the parent image to calculate pixel stats for. If
None
, calculate pixel stats for all parent images. By default, Nonecalc_mean (bool, optional) – Whether to calculate mean pixel values. By default,
True
.calc_std (bool, optional) – Whether to calculate standard deviation of pixel values. By default,
True
.verbose (bool, optional) – Whether to print verbose outputs. By default,
False
.
- Return type:
None
Notes
Pixel stats are calculated for patches of the parent image specified by
parent_id
.If
parent_id
isNone
, pixel stats are calculated for all parent images in the object.If mean or standard deviation of pixel values has already been calculated for a patch, the calculation is skipped.
Pixel stats are stored in the
images
attribute of theMapImages
instance, under thepatch
key for each patch.If no patches are found for a parent image, a warning message is displayed and the method moves on to the next parent image.
- convert_images(save=False, save_format='csv', delimiter=',')
Convert the
MapImages
instance’simages
dictionary into pandas DataFrames (or geopandas GeoDataFrames if geo-referenced) for easy manipulation.- Parameters:
save (bool, optional) – Whether to save the dataframes as files. By default
False
.save_format (str, optional) – If
save = True
, the file format to use when saving the dataframes. Options of csv (“csv”), excel (“excel” or “xlsx”) or geojson (“geojson”). By default, “csv”.delimiter (str, optional) – The delimiter to use when saving the dataframe. By default
","
.
- Returns:
The method returns a tuple of two DataFrames/GeoDataFrames: One for the
parent
images and one for thepatch
images.- Return type:
tuple of two pandas DataFrames or geopandas GeoDataFrames
- show_parent(parent_id, column_to_plot=None, **kwargs)
A wrapper method for
show()
which plots all patches of a specified parent (parent_id).- Parameters:
- Returns:
A list of figures created by the method.
- Return type:
list
Notes
This is a wrapper method. See the documentation of the
show()
method for more detail.
- show(image_ids, column_to_plot=None, figsize=(10, 10), plot_parent=True, patch_border=True, border_color='r', vmin=None, vmax=None, alpha=1.0, cmap='viridis', discrete_cmap=256, plot_histogram=False, save_kml_dir=False, image_width_resolution=None, kml_dpi_image=None)
Plot images from a list of
image_ids
.- Parameters:
image_ids (str or list) – Image ID or list of image IDs to be plotted.
column_to_plot (str, optional) – Column whose values will be plotted on patches, by default
None
.plot_parent (bool, optional) – If
True
, parent image will be plotted in background, by defaultTrue
.figsize (tuple, optional) – The size of the figure to be plotted. By default,
(10,10)
.patch_border (bool, optional) – If
True
, a border will be placed around each patch, by defaultTrue
.border_color (str, optional) – The color of the border. Default is
"r"
.vmin (float, optional) – The minimum value for the colormap. If
None
, will be set to minimum value incolumn_to_plot
, by defaultNone
.vmax (float, optional) – The maximum value for the colormap. If
None
, will be set to the maximum value incolumn_to_plot
, by defaultNone
.alpha (float, optional) – Transparency level for plotting
value
with floating point values ranging from 0.0 (transparent) to 1 (opaque), by default1.0
.cmap (str, optional) – Color map used to visualize chosen
column_to_plot
values, by default"viridis"
.discrete_cmap (int, optional) – Number of discrete colors to use in color map, by default
256
.plot_histogram (bool, optional) – If
True
, plot histograms of thevalue
of images. By defaultFalse
.save_kml_dir (str or bool, optional) – If
True
, save KML files of the images. If a string is provided, it is the path to the directory in which to save the KML files. If set toFalse
, no files are saved. By defaultFalse
.image_width_resolution (int or None, optional) –
The pixel width to be used for plotting. If
None
, the resolution is not changed. Default isNone
.Note: Only relevant when
tree_level="parent"
.kml_dpi_image (int or None, optional) – The resolution, in dots per inch, to create KML images when
save_kml_dir
is specified (as eitherTrue
or with path). By defaultNone
.
- Returns:
A list of figures created by the method.
- Return type:
list
- load_patches(patch_paths, parent_paths=False, patch_file_ext=False, parent_file_ext=False, add_geo_info=False, clear_images=False)
Loads patch images from the given paths and adds them to the
images
dictionary in theMapImages
instance.- Parameters:
patch_paths (str) –
The file path of the patches to be loaded.
Note: The ``patch_paths`` parameter accepts wildcards.
parent_paths (str or bool, optional) –
The file path of the parent images to be loaded. If set to
False
, no parents are loaded. Default isFalse
.Note: The ``parent_paths`` parameter accepts wildcards.
patch_file_ext (str or bool, optional) – The file extension of the patches to be loaded, ignored if file extensions are specified in
patch_paths
(e.g. with"./path/to/dir/*png"
) By defaultFalse
.parent_file_ext (str or bool, optional) – The file extension of the parent images, ignored if file extensions are specified in
parent_paths
(e.g. with"./path/to/dir/*png"
) By defaultFalse
.add_geo_info (bool, optional) – If
True
, adds geographic information to the parent image. Default isFalse
.clear_images (bool, optional) – If
True
, clears the images from theimages
dictionary before loading. Default isFalse
.
- Return type:
None
- static detect_parent_id_from_path(image_id, parent_delimiter='#')
Detect parent IDs from
image_id
.- Parameters:
image_id (int or str) – ID of patch.
parent_delimiter (str, optional) – Delimiter used to separate parent ID when naming patch, by default
"#"
.
- Returns:
Parent ID.
- Return type:
str
- static detect_pixel_bounds_from_path(image_id)
Detects borders from the path assuming patch is named using the following format:
...-min_x-min_y-max_x-max_y-...
- Parameters:
image_id (int or str) – ID of image
.. –
- border_delimiterstr, optional
Delimiter used to separate border values when naming patch image, by default
"-"
.
- Returns:
Border (min_x, min_y, max_x, max_y) of image
- Return type:
tuple of min_x, min_y, max_x, max_y
- load_parents(parent_paths=False, parent_ids=False, parent_file_ext=False, overwrite=False, add_geo_info=False)
Load parent images from file paths (
parent_paths
).If
parent_paths
is not given, onlyparent_ids
, no image path will be added to the images.- Parameters:
parent_paths (str or bool, optional) – Path to parent images, by default
False
.parent_ids (list, str or bool, optional) – ID(s) of parent images. Ignored if
parent_paths
are specified. By defaultFalse
.parent_file_ext (str or bool, optional) – The file extension of the parent images, ignored if file extensions are specified in
parent_paths
(e.g. with"./path/to/dir/*png"
) By defaultFalse
.overwrite (bool, optional) – If
True
, current parents will be overwritten, by defaultFalse
.add_geo_info (bool, optional) – If
True
, geographical info will be added to parents, by defaultFalse
.
- Return type:
None
- load_df(parent_df=None, patch_df=None, clear_images=True)
Create
MapImages
instance by loading data from pandas DataFrame(s).- Parameters:
parent_df (pandas.DataFrame or gpd.GeoDataFrame or None, optional) – DataFrame containing parents or path to parents, by default
None
.patch_df (pandas.DataFrame or gpd.GeoDataFrame or None, optional) – DataFrame containing patches, by default
None
.clear_images (bool, optional) – If
True
, clear images before reading the dataframes, by defaultTrue
.
- Return type:
None
- load_csv(parent_path=None, patch_path=None, clear_images=False, index_col_patch=0, index_col_parent=0, delimiter=',')
Load CSV files containing information about parent and patches, and update the
images
attribute of theMapImages
instance with the loaded data.- Parameters:
parent_path (str or pathlib.Path or None) – Path to the CSV file containing parent image information. By default,
None
.patch_path (str or pathlib.Path or None) – Path to the CSV file containing patch information. By default,
None
.clear_images (bool, optional) – If True, clear all previously loaded image information before loading new information. Default is
False
.index_col_patch (int or str or None, optional) – Column to set as index for the patch DataFrame, by default
0
.index_col_parent (int or str or None, optional) – Column to set as index for the parent DataFrame, by default
0
.delimiter (str, optional) – The delimiter to use when reading the dataframe. By default
","
.
- Return type:
None
- add_geo_info(target_crs='EPSG:4326', verbose=True)
Add coordinates (reprojected to EPSG:4326) to all parents images using image metadata.
- Parameters:
target_crs (str, optional) – Projection to convert coordinates into, by default
"EPSG:4326"
.verbose (bool, optional) – Whether to print verbose output, by default
True
- Return type:
None
Notes
For each image in the parents dictionary, this method calls
_add_geo_info_id
and coordinates (if present) to the image in theparent
dictionary.
- save_parents_as_geotiffs(rewrite=False, verbose=False, crs=None)
Save all parents in
MapImages
instance as geotiffs.- Parameters:
rewrite (bool, optional) – Whether to rewrite files if they already exist, by default False
verbose (bool, optional) – Whether to print verbose outputs, by default False
crs (str, optional) – The CRS of the coordinates. If None, the method will first look for
crs
in the parents dictionary and use those. Ifcrs
cannot be found in the dictionary, the method will use “EPSG:4326”. By default None.
- Return type:
None
- save_patches_as_geotiffs(rewrite=False, verbose=False, crs=None)
Save all patches in
MapImages
instance as geotiffs.- Parameters:
rewrite (bool, optional) – Whether to rewrite files if they already exist, by default False
verbose (bool, optional) – Whether to print verbose outputs, by default False
crs (str, optional) – The CRS of the coordinates. If None, the method will first look for
crs
in the patches dictionary and use those. Ifcrs
cannot be found in the dictionary, the method will use “EPSG:4326”. By default None.
- Return type:
None
- save_patches_to_geojson(geojson_fname='patches.geojson', rewrite=False, crs=None)
Saves patches to a geojson file.
- Parameters:
geojson_fname (Optional[str], optional) – The name of the geojson file, by default “patches.geojson”
rewrite (Optional[bool], optional) – Whether to overwrite an existing file, by default False.
crs (Optional[str], optional) – The CRS to use when writing the geojson. If None, the method will look for “crs” in the patches dictionary and, if found, will use that. Otherwise it will set the crs to the default value of “EPSG:4326”. By default None
- Return type:
None
- mapreader.loader(path_images=None, tree_level='parent', parent_path=None, **kwargs)
Creates a
MapImages
class to manage a collection of image paths and construct image objects.- Parameters:
path_images (str or None, optional) – Path to the directory containing images (accepts wildcards). By default,
None
tree_level (str, optional) – Level of the image hierarchy to construct. The value can be
"parent"
(default) and"patch"
.parent_path (str, optional) – Path to parent images (if applicable), by default
None
.**kwargs (dict, optional) – Additional keyword arguments to be passed to the
_images_constructor()
method.
- Returns:
The
MapImages
class which can manage a collection of image paths and construct image objects.- Return type:
Notes
This is a wrapper method. See the documentation of the
MapImages
class for more detail.
- mapreader.load_patches(patch_paths, patch_file_ext=False, parent_paths=False, parent_file_ext=False, add_geo_info=False, clear_images=False)
Creates a
MapImages
class to manage a collection of image paths and construct image objects. Then loads patch images from the given paths and adds them to theimages
dictionary in theMapImages
instance.- Parameters:
patch_paths (str) –
The file path of the patches to be loaded.
Note: The ``patch_paths`` parameter accepts wildcards.
patch_file_ext (str or bool, optional) – The file extension of the patches, ignored if file extensions are specified in
patch_paths
(e.g. with"./path/to/dir/*png"
) By defaultFalse
.parent_paths (str or bool, optional) –
The file path of the parent images to be loaded. If set to
False
, no parents are loaded. Default isFalse
.Note: The ``parent_paths`` parameter accepts wildcards.
parent_file_ext (str or bool, optional) – The file extension of the parent images, ignored if file extensions are specified in
parent_paths
(e.g. with"./path/to/dir/*png"
) By defaultFalse
.add_geo_info (bool, optional) – If
True
, adds geographic information to the parent image. Default isFalse
.clear_images (bool, optional) – If
True
, clears the images from theimages
dictionary before loading. Default isFalse
.
- Returns:
The
MapImages
class which can manage a collection of image paths and construct image objects.- Return type:
Notes
This is a wrapper method. See the documentation of the
MapImages
class for more detail.This function in particular, also calls the
load_patches()
method. Please see the documentation for that method for more information as well.
- class mapreader.Annotator(patch_df=None, parent_df=None, labels=None, patch_paths=None, parent_paths=None, metadata_path=None, annotations_dir='./annotations', patch_paths_col='image_path', label_col='label', show_context=False, border=False, auto_save=True, delimiter=',', sortby=None, ascending=True, username=None, task_name=None, min_values=None, max_values=None, filter_for=None, surrounding=1, max_size=1000, resize_to=None)
Annotator class for annotating patches with labels.
- Parameters:
patch_df (str, pathlib.Path, pd.DataFrame or gpd.GeoDataFrame or None, optional) – Path to a CSV/geojson file or a pandas DataFrame/ geopandas GeoDataFrame containing patch data, by default None
parent_df (str, pathlib.Path, pd.DataFrame or gpd.GeoDataFrame or None, optional) – Path to a CSV/geojson file or a pandas DataFrame/ geopandas GeoDataFrame containing parent data, by default None
labels (list, optional) – List of labels for annotation, by default None
patch_paths (str or None, optional) – Path to patch images, by default None Ignored if patch_df is provided.
parent_paths (str or None, optional) – Path to parent images, by default None Ignored if parent_df is provided.
metadata_path (str or None, optional) – Path to metadata CSV file, by default None
annotations_dir (str, optional) – Directory to store annotations, by default “./annotations”
patch_paths_col (str, optional) – Name of the column in which image paths are stored in patch DataFrame, by default “image_path”
label_col (str, optional) – Name of the column in which labels are stored in patch DataFrame, by default “label”
show_context (bool, optional) – Whether to show context when loading patches, by default False
border (bool, optional) – Whether to add a border around the central patch when showing context, by default False
auto_save (bool, optional) – Whether to automatically save annotations, by default True
delimiter (str, optional) – Delimiter used in CSV files, by default “,”
sortby (str or None, optional) – Name of the column to use to sort the patch DataFrame, by default None. Default sort order is
ascending=True
. Passascending=False
keyword argument to sort in descending order.ascending (bool, optional) – Whether to sort the DataFrame in ascending order when using the
sortby
argument, by default True.username (str or None, optional) – Username to use when saving annotations file, by default None. If not provided, a random string is generated.
task_name (str or None, optional) – Name of the annotation task, by default None.
min_values (dict, optional) – A dictionary consisting of column names (keys) and minimum values as floating point values (values), by default None.
max_values (dict, optional) – A dictionary consisting of column names (keys) and maximum values as floating point values (values), by default None.
filter_for (dict, optional) – A dictionary consisting of column names (keys) and values to filter for (values), by default None.
surrounding (int, optional) – The number of surrounding images to show for context, by default 1.
max_size (int, optional) – The size in pixels for the longest side to which constrain each patch image, by default 1000.
resize_to (int or None, optional) – The size in pixels for the longest side to which resize each patch image, by default None.
- Raises:
FileNotFoundError – If the provided patch_df or parent_df file path does not exist
ValueError – If patch_df or parent_df is not a valid path to a CSV/geojson file or a pandas DataFrame or a geopandas GeoDataFrame If patch_df or patch_paths is not provided If the DataFrame does not have the required columns If sortby is not a string or None If labels provided are not in the form of a list
SyntaxError – If labels provided are not in the form of a list
- image_list
- id
- annotations_file
- labels
- patch_df
- label_col
- patch_paths_col
- show_context
- border
- auto_save
- username
- task_name
- surrounding
- max_size
- resize_to
- current_index
- previous_index = 0
- annotate(show_context=None, border=None, sortby=None, ascending=None, min_values=None, max_values=None, surrounding=None, resize_to=None, max_size=None, show_vals=None)
Annotate at the patch-level of the current patch. Renders the annotation interface for the first image.
- Parameters:
show_context (bool or None, optional) – Whether or not to display the surrounding context for each image. Default is None.
border (bool or None, optional) – Whether or not to display a border around the image (when using show_context). Default is None.
sortby (str or None, optional) – Name of the column to use to sort the patch DataFrame, by default None. Default sort order is
ascending=True
. Passascending=False
keyword argument to sort in descending order.ascending (bool, optional) – Whether to sort the DataFrame in ascending order when using the
sortby
argument, by default True.min_values (dict or None, optional) – Minimum values for each property to filter images for annotation. It should be provided as a dictionary consisting of column names (keys) and minimum values as floating point values (values). Default is None.
max_values (dict or None, optional) – Maximum values for each property to filter images for annotation. It should be provided as a dictionary consisting of column names (keys) and minimum values as floating point values (values). Default is None
surrounding (int or None, optional) – The number of surrounding images to show for context. Default: 1.
max_size (int or None, optional) – The size in pixels for the longest side to which constrain each patch image. Default: 100.
resize_to (int or None, optional) – The size in pixels for the longest side to which resize each patch image. Default: None.
show_vals (list[str] or None, optional) – List of column names to show in the display. By default, None.
- Return type:
None
Notes
This method is a wrapper for the
_annotate()
method.
- get_patch_image(ix)
Returns the image at the given index.
- Parameters:
ix (int | str) – The index of the image in the dataframe.
- Returns:
A PIL.Image object of the image at the given index.
- Return type:
PIL.Image
- get_labelled_data(sort=True, index_labels=False, include_paths=True)
Returns the annotations made so far.
- Parameters:
sort (bool, optional) – Whether to sort the dataframe by the order of the images in the input data, by default True
index_labels (bool, optional) – Whether to return the label’s index number (in the labels list provided in setting up the instance) or the human-readable label for each row, by default False
include_paths (bool, optional) – Whether to return a column containing the full path to the annotated image or not, by default True
- Returns:
A DataFrame/GeoDataFrame containing the labelled images and their associated label index.
- Return type:
pandas.DataFrame or geopandas.GeoDataFrame
- property filtered: pandas.DataFrame
- Return type:
pandas.DataFrame
- class mapreader.AnnotationsLoader
A class for loading annotations and preparing datasets and dataloaders for use in training/validation of a model.
- annotations
- labels_map
- reviewed
- patch_paths_col = None
- label_col = None
- datasets = None
- load(annotations, labels_map=None, delimiter=',', images_dir=None, remove_broken=True, ignore_broken=False, patch_paths_col='image_path', label_col='label', append=True, scramble_frame=False, reset_index=False)
Loads annotations from a CSV/TSV/geojson file, a pandas DataFrame or a geopandas GeoDataFrame. Sets the
patch_paths_col
andlabel_col
attributes.- Parameters:
annotations (str | pathlib.Path | pd.DataFrame | gpd.GeoDataFrame) – The annotations. Can either be the path to a CSV/TSV/geojson file, a pandas DataFrame or a geopandas GeoDataFrame.
labels_map (Optional[dict], optional) – A dictionary mapping labels to indices. If not provided, labels will be mapped to indices based on the order in which they appear in the annotations dataframe. By default None.
delimiter (str, optional) – The delimiter to use when loading the csv file as a DataFrame, by default “,”.
images_dir (str, optional) – The path to the directory in which patches are stored. This argument should be passed if image paths are different from the path saved in existing annotations. If None, no updates will be made to the image paths in the annotations DataFrame/csv. By default None.
remove_broken (bool, optional) – Whether to remove annotations with broken image paths. If False, annotations with broken paths will remain in annotations DataFrame and may cause issues! By default True.
ignore_broken (bool, optional) – Whether to ignore broken image paths (only valid if remove_broken=False). If True, annotations with broken paths will remain in annotations DataFrame and no error will be raised. This may cause issues! If False, annotations with broken paths will raise error. By default, False.
patch_paths_col (str, optional) – The name of the column containing the image paths, by default “image_path”.
label_col (str, optional) – The name of the column containing the image labels, by default “label”.
append (bool, optional) – Whether to append the annotations to a pre-existing
annotations
DataFrame. If False, existing DataFrame will be overwritten. By default True.scramble_frame (bool, optional) – Whether to shuffle the rows of the DataFrame, by default False.
reset_index (bool, optional) – Whether to reset the index of the DataFrame (e.g. after shuffling), by default False.
- Raises:
ValueError – If
annotations
is passed as something other than a string or pd.DataFrame.
- show_patch(patch_id)
Display a patch and its label.
- Parameters:
patch_id (str) – The image ID of the patch to show.
- Return type:
None
- print_unique_labels()
Prints unique labels
- Raises:
ValueError – If no annotations are found.
- Return type:
None
- review_labels(label_to_review=None, chunks=8 * 3, num_cols=8, exclude_df=None, include_df=None, deduplicate_col='image_id')
Perform image review on annotations and update labels for a given label or all labels.
- Parameters:
label_to_review (str, optional) – The target label to review. If not provided, all labels will be reviewed, by default
None
.chunks (int, optional) – The number of images to display at a time, by default
24
.num_cols (int, optional) – The number of columns in the display, by default
8
.exclude_df (pandas.DataFrame or gpd.GeoDataFrame or None, optional) – A DataFrame of images to exclude from review, by default
None
.include_df (pandas.DataFrame or gpd.GeoDataFrame or None, optional) – A DataFrame of images to include for review, by default
None
.deduplicate_col (str, optional) – The column to use for deduplicating reviewed images, by default
"image_id"
.
- Return type:
None
Notes
This method reviews images with their corresponding labels and allows the user to change the label for each image.
Updated labels are saved in
annotations
and in a newly createdreviewed
DataFrame.If
exclude_df
is provided, images found in this df are skipped in the review process.If
include_df
is provided, only images found in this df are reviewed.The
reviewed
DataFrame is deduplicated based on thededuplicate_col
.
- show_sample(label_to_show, num_samples=9)
Show a random sample of images with the specified label (tar_label).
- Parameters:
label_to_show (str, optional) – The label of the images to show.
num_sample (int, optional) – The number of images to show. If
None
, all images with the specified label will be shown. Default is9
.num_samples (int | None)
- Return type:
None
- create_datasets(frac_train=0.7, frac_val=0.15, frac_test=0.15, random_state=1364, train_transform='train', val_transform='val', test_transform='test', context_datasets=False, context_df=None)
Splits the dataset into three subsets: training, validation, and test sets (DataFrames) and saves them as a dictionary in
self.datasets
.- Parameters:
frac_train (float, optional) – Fraction of the dataset to be used for training. By default
0.70
.frac_val (float, optional) – Fraction of the dataset to be used for validation. By default
0.15
.frac_test (float, optional) – Fraction of the dataset to be used for testing. By default
0.15
.random_state (int, optional) – Random seed to ensure reproducibility. The default is
1364
.train_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the training dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “train”.
val_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the validation dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “val”.
test_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the test dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “test”.
context_datasets (bool, optional) – Whether to create context datasets or not. By default False.
context_df (str or or pathlib.Path or pandas.DataFrame or gpd.GeoDataFrame, optional) – The DataFrame containing all patches if using context datasets. Used to create context images. By default None.
- Raises:
ValueError – If the sum of fractions of training, validation and test sets does not add up to 1.
- Return type:
None
Notes
This method saves the split datasets as a dictionary in
self.datasets
.Following fractional ratios provided by the user, where each subset is stratified by the values in a specific column (that is, each subset has the same relative frequency of the values in the column). It performs this splitting by running
train_test_split()
twice.See
PatchDataset
for more information on transforms.
- create_patch_datasets(train_transform, val_transform, test_transform, df_train, df_val, df_test)
- create_patch_context_datasets(context_df, train_transform, val_transform, test_transform, df_train, df_val, df_test)
- create_dataloaders(batch_size=16, sampler='default', shuffle=False, num_workers=0, **kwargs)
Creates a dictionary containing PyTorch dataloaders saves it to as
self.dataloaders
and returns it.- Parameters:
batch_size (int, optional) – The batch size to use for the dataloader. By default
16
.sampler (Sampler, str or None, optional) – The sampler to use when creating batches from the training dataset.
shuffle (bool, optional) – Whether to shuffle the dataset during training. By default
False
.num_workers (int, optional) – The number of worker threads to use for loading data. By default
0
.**kwds – Additional keyword arguments to pass to PyTorch’s
DataLoader
constructor.
- Returns:
Dictionary containing dataloaders.
- Return type:
Dict
Notes
sampler
will only be applied to the training dataset (datasets[“train”]).
- class mapreader.PatchDataset(patch_df, transform, delimiter=',', patch_paths_col='image_path', label_col=None, label_index_col=None, image_mode='RGB')
Bases:
torch.utils.data.Dataset
A PyTorch Dataset class for loading image patches from a DataFrame.
- Parameters:
patch_df (str or pathlib.Path or pandas.DataFrame or gpd.GeoDataFrame) – DataFrame or path to CSV/TSV/geojson file containing the paths to image patches and their labels.
transform (Union[str, transforms.Compose, Callable]) – The transform to use on the image. A string can be used to call default transforms - options are “train”, “test” or “val”. Alternatively, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose) that takes in an image and performs image transformations can be used. At minimum, transform should be
torchvision.transforms.ToTensor()
.delimiter (str, optional) – The delimiter to use when reading the CSV/TSV file. By default
","
.patch_paths_col (str, optional) – The name of the column in the DataFrame containing the image paths. Default is “image_path”.
label_col (str, optional) – The name of the column containing the image labels. Default is None.
label_index_col (str, optional) – The name of the column containing the indices of the image labels. Default is None.
image_mode (str, optional) – The color format to convert the image to. Default is “RGB”.
- patch_df
DataFrame containing the paths to image patches and their labels.
- Type:
pandas.DataFrame or gpd.GeoDataFrame
- label_col
The name of the column containing the image labels.
- Type:
str
- label_index_col
The name of the column containing the labels indices.
- Type:
str
- patch_paths_col
The name of the column in the DataFrame containing the image paths.
- Type:
str
- image_mode
The color format to convert the image to.
- Type:
str
- unique_labels
The unique labels in the label column of the patch_df DataFrame.
- Type:
list
- transform
A callable object (a torchvision transform) that takes in an image and performs image transformations.
- Type:
callable
- __len__()
Returns the length of the dataset.
- Return type:
int
- __getitem__(idx)
Retrieves the image, its label and the index of that label at the given index in the dataset.
- Parameters:
idx (int | torch.Tensor)
- Return type:
tuple[tuple[torch.Tensor], str, int]
- return_orig_image(idx)
Retrieves the original image at the given index in the dataset.
- Parameters:
idx (int | torch.Tensor)
- Return type:
PIL.Image
- _default_transform(t_type, resize2)
Returns a transforms.Compose containing the default image transformations for the train and validation sets.
- Parameters:
t_type (str | None)
resize (int | tuple[int, int] | None)
- Return type:
torchvision.transforms.Compose
- Raises:
ValueError – If
label_col
not inpatch_df
.ValueError – If
label_index_col
not inpatch_df
.ValueError – If
transform
passed as a string, but not one of “train”, “test” or “val”.
- Parameters:
patch_df (str | pathlib.Path | pandas.DataFrame | geopandas.GeoDataFrame)
transform (str | torchvision.transforms.Compose | Callable)
delimiter (str)
patch_paths_col (str | None)
label_col (str | None)
label_index_col (str | None)
image_mode (str | None)
- label_col
- label_index_col
- image_mode
- patch_paths_col
- unique_labels = []
- return_orig_image(idx)
Return the original image associated with the given index.
- Parameters:
idx (int or Tensor) – The index of the desired image, or a Tensor containing the index.
- Returns:
The original image associated with the given index.
- Return type:
PIL.Image.Image
Notes
This method returns the original image associated with the given index by loading the image file using the file path stored in the
patch_paths_col
column of thepatch_df
DataFrame at the given index. The loaded image is then converted to the format specified by theimage_mode
attribute of the object. The resultingPIL.Image.Image
object is returned.
- create_dataloaders(set_name='infer', batch_size=16, shuffle=False, num_workers=0, **kwargs)
Creates a dictionary containing a PyTorch dataloader.
- Parameters:
set_name (str, optional) – The name to use for the dataloader.
batch_size (int, optional) – The batch size to use for the dataloader. By default
16
.shuffle (bool, optional) – Whether to shuffle the PatchDataset, by default False
num_workers (int, optional) – The number of worker threads to use for loading data. By default
0
.**kwargs – Additional keyword arguments to pass to PyTorch’s
DataLoader
constructor.
- Returns:
Dictionary containing dataloaders.
- Return type:
Dict
- class mapreader.PatchContextDataset(patch_df, total_df, transform, delimiter=',', patch_paths_col='image_path', label_col=None, label_index_col=None, image_mode='RGB', context_dir='./maps/maps_context', create_context=False, parent_path='./maps')
Bases:
PatchDataset
A PyTorch Dataset class for loading contextual information about image patches from a DataFrame.
- Parameters:
patch_df (str or pathlib.Path or pandas.DataFrame or gpd.GeoDataFrame) – DataFrame or path to CSV/TSV/geojson file containing the paths to image patches and their labels.
total_df (str or pathlib.Path or pandas.DataFrame or gpd.GeoDataFrame) – DataFrame or path to CSV/TSV/geojson file containing the paths to all images and their labels.
transform (str) – Torchvision transform to be applied to context images. Either “train” or “val”.
delimiter (str) – The delimiter to use when reading the CSV/TSV file. By default
","
.patch_paths_col (str, optional) – The name of the column in the DataFrame containing the image paths. Default is “image_path”.
label_col (str, optional) – The name of the column containing the image labels. Default is None.
label_index_col (str, optional) – The name of the column containing the indices of the image labels. Default is None.
image_mode (str, optional) – The color space of the images. Default is “RGB”.
context_dir (str, optional) – The path to context maps (or, where to save context if not created yet). Default is “./maps/maps_context”.
create_context (bool, optional) – Whether or not to create context maps. Default is False.
parent_path (str, optional) – The path to the directory containing parent images. Default is “./maps”.
- patch_df
DataFrame with columns representing image paths, labels, and object bounding boxes.
- Type:
pandas.DataFrame or gpd.GeoDataFrame
- label_col
The name of the column containing the image labels.
- Type:
str
- label_index_col
The name of the column containing the labels indices.
- Type:
str
- patch_paths_col
The name of the column in the DataFrame containing the image paths.
- Type:
str
- image_mode
The color space of the images.
- Type:
str
- parent_path
The path to the directory containing parent images.
- Type:
str
- create_context
Whether or not to create context maps.
- Type:
bool
- context_dir
The path to context maps.
- Type:
str
- unique_labels
The unique labels in
label_col
.- Type:
list or str
- label_col
- label_index_col
- image_mode
- patch_paths_col
- parent_path
- create_context
- context_dir
- save_context(processors=10, sleep_time=0.001, use_parhugin=True, overwrite=False)
Save context images for all patches in the patch_df.
- Parameters:
processors (int, optional) – The number of required processors for the job, by default 10.
sleep_time (float, optional) – The time to wait between jobs, by default 0.001.
use_parhugin (bool, optional) – Whether to use Parhugin to parallelize the job, by default True.
overwrite (bool, optional) – Whether to overwrite existing parent files, by default False.
- Return type:
None
Notes
Parhugin is a Python package for parallelizing computations across multiple CPU cores. The method uses Parhugin to parallelize the computation of saving parent patches to disk. When Parhugin is installed and
use_parhugin
is set to True, the method parallelizes the calling of theget_context_id
method and its corresponding arguments. If Parhugin is not installed oruse_parhugin
is set to False, the method executes the loop over patch indices sequentially instead.
- get_context_id(id, overwrite=False, save_context=False, return_image=True)
Save the parents of a specific patch to the specified location.
- Parameters:
id – Index of the patch in the dataset.
overwrite (bool, optional) – Whether to overwrite the existing parent files. Default is False.
save_context (bool, optional) – Whether to save the context image. Default is False.
return_image (bool, optional) – Whether to return the context image. Default is True.
- Raises:
ValueError – If the patch is not found in the dataset.
- Return type:
None
- plot_sample(idx)
Plot a sample patch and its corresponding context from the dataset.
- Parameters:
idx (int) – The index of the sample to plot.
- Returns:
Displays the plot of the sample patch and its corresponding context.
- Return type:
None
Notes
This method plots a sample patch and its corresponding context side-by- side in a single figure with two subplots. The figure size is set to 10in x 5in, and the titles of the subplots are set to “Patch” and “Context”, respectively. The resulting figure is displayed using the
matplotlib
library (required).
- class mapreader.ClassifierContainer(model, labels_map, dataloaders=None, device='default', input_size=(224, 224), is_inception=False, load_path=None, force_device=False, **kwargs)
A class to store and train a PyTorch model.
- Parameters:
model (str, nn.Module or None) –
The PyTorch model to add to the object.
If passed as a string, will run
_initialize_model(model, **kwargs)
. See https://pytorch.org/vision/0.8/models.html for options.Must be
None
ifload_path
is specified as model will be loaded from file.
labels_map (Dict or None) – A dictionary containing the mapping of each label index to its label, with indices as keys and labels as values (i.e. idx: label). Can only be
None
ifload_path
is specified as labels_map will be loaded from file.dataloaders (Dict or None) – A dictionary containing set names as keys and dataloaders as values (i.e. set_name: dataloader).
device (str, optional) – The device to be used for training and storing models. Can be set to “default”, “cpu”, “cuda:0”, etc. By default, “default”.
input_size (int, optional) – The expected input size of the model. Default is
(224,224)
.is_inception (bool, optional) – Whether the model is an Inception-style model. Default is
False
.load_path (str, optional) – The path to an
.obj
file containing aforce_device (bool, optional) – Whether to force the use of a specific device. If set to
True
, the default device is used. Defaults toFalse
.kwargs (Dict) – Keyword arguments to pass to the
_initialize_model()
method (if passingmodel
as a string).
- device
The device being used for training and storing models.
- Type:
torch.device
- dataloaders
A dictionary to store dataloaders for the model.
- Type:
dict
- labels_map
A dictionary mapping label indices to their labels.
- Type:
dict
- dataset_sizes
A dictionary to store sizes of datasets for the model.
- Type:
dict
- model
The model.
- Type:
torch.nn.Module
- input_size
The size of the input to the model.
- Type:
None or tuple of int
- is_inception
A flag indicating if the model is an Inception model.
- Type:
bool
- optimizer
The optimizer being used for training the model.
- Type:
None or torch.optim.Optimizer
- scheduler
The learning rate scheduler being used for training the model.
- Type:
None or torch.optim.lr_scheduler._LRScheduler
- loss_fn
The loss function to use for training the model.
- Type:
None or nn.modules.loss._Loss
- metrics
A dictionary to store the metrics computed during training.
- Type:
dict
- last_epoch
The last epoch number completed during training.
- Type:
int
- best_loss
The best validation loss achieved during training.
- Type:
torch.Tensor
- best_epoch
The epoch in which the best validation loss was achieved during training.
- Type:
int
- tmp_save_filename
A temporary file name to save checkpoints during training and validation.
- Type:
str
- generate_layerwise_lrs(min_lr, max_lr, spacing='linspace')
Calculates layer-wise learning rates for a given set of model parameters.
- Parameters:
min_lr (float) – The minimum learning rate to be used.
max_lr (float) – The maximum learning rate to be used.
spacing (str, optional) – The type of sequence to use for spacing the specified interval learning rates. Can be either
"linspace"
or"geomspace"
, where “linspace” uses evenly spaced learning rates over a specified interval and “geomspace” uses learning rates spaced evenly on a log scale (a geometric progression). By default"linspace"
.
- Returns:
A list of dictionaries containing the parameters and learning rates for each layer.
- Return type:
list of dicts
- initialize_optimizer(optim_type='adam', params2optimize='default', optim_param_dict=None, add_optim=True)
Initializes an optimizer for the model and adds it to the classifier object.
- Parameters:
optim_type (str, optional) – The type of optimizer to use. Can be set to
"adam"
(default),"adamw"
, or"sgd"
.params2optimize (str or iterable, optional) – The parameters to optimize. If set to
"default"
, all model parameters that require gradients will be optimized. Default is"default"
.optim_param_dict (dict, optional) – The parameters to pass to the optimizer constructor as a dictionary, by default
{"lr": 1e-3}
.add_optim (bool, optional) – If
True
, adds the optimizer to the classifier object, by defaultTrue
.
- Returns:
optimizer – The initialized optimizer. Only returned if
add_optim
is set toFalse
.- Return type:
torch.optim.Optimizer
Notes
If
add_optim
is True, the optimizer will be added to object.Note that the first argument of an optimizer is parameters to optimize, e.g.
params2optimize = model_ft.parameters()
:model_ft.parameters()
: all parameters are being optimizedmodel_ft.fc.parameters()
: only parameters of final layer are being optimized
Here, we use:
filter(lambda p: p.requires_grad, self.model.parameters())
- add_optimizer(optimizer)
Add an optimizer to the classifier object.
- Parameters:
optimizer (torch.optim.Optimizer) – The optimizer to add to the classifier object.
- Return type:
None
- initialize_scheduler(scheduler_type='steplr', scheduler_param_dict=None, add_scheduler=True)
Initializes a learning rate scheduler for the optimizer and adds it to the classifier object.
- Parameters:
scheduler_type (str, optional) – The type of learning rate scheduler to use. Can be either
"steplr"
(default) or"onecyclelr"
.scheduler_param_dict (dict, optional) – The parameters to pass to the scheduler constructor, by default
{"step_size": 10, "gamma": 0.1}
.add_scheduler (bool, optional) – If
True
, adds the scheduler to the classifier object, by defaultTrue
.
- Raises:
ValueError – If the specified
scheduler_type
is not implemented.- Returns:
scheduler – The initialized learning rate scheduler. Only returned if
add_scheduler
is set to False.- Return type:
torch.optim.lr_scheduler._LRScheduler
- add_scheduler(scheduler)
Add a scheduler to the classifier object.
- Parameters:
scheduler (torch.optim.lr_scheduler._LRScheduler) – The scheduler to add to the classifier object.
- Raises:
ValueError – If no optimizer has been set. Use
initialize_optimizer
oradd_optimizer
to set an optimizer first.- Return type:
None
- add_loss_fn(loss_fn='cross entropy')
Add a loss function to the classifier object.
- Parameters:
loss_fn (str or torch.nn.modules.loss._Loss) – The loss function to add to the classifier object. Accepted string values are “cross entropy” or “ce” (cross-entropy), “bce” (binary cross-entropy) and “mse” (mean squared error).
- Returns:
The function only modifies the
loss_fn
attribute of the classifier and does not return anything.- Return type:
None
- model_summary(input_size=None, trainable_col=False, **kwargs)
Print a summary of the model.
- Parameters:
input_size (tuple or list, optional) – The size of the input data. If None, input size is taken from “train” dataloader (
self.dataloaders["train"]
).trainable_col (bool, optional) – If
True
, adds a column showing which parameters are trainable. Defaults toFalse
.**kwargs (Dict) – Keyword arguments to pass to
torchinfo.summary()
(see https://github.com/TylerYep/torchinfo).
- Return type:
None
Notes
Other ways to check params:
sum(p.numel() for p in myclassifier.model.parameters())
sum(p.numel() for p in myclassifier.model.parameters() if p.requires_grad)
And:
for name, param in self.model.named_parameters(): n = name.split(".")[0].split("_")[0] print(name, param.requires_grad)
- freeze_layers(layers_to_freeze=None)
Freezes the specified layers in the neural network by setting
requires_grad
attribute to False for their parameters.- Parameters:
layers_to_freeze (list of str, optional) – List of names of the layers to freeze. If a layer name ends with an asterisk (
"*"
), then all parameters whose name contains the layer name (excluding the asterisk) are frozen. Otherwise, only the parameters with an exact match to the layer name are frozen. By default,[]
.- Returns:
The function only modifies the
requires_grad
attribute of the specified parameters and does not return anything.- Return type:
None
Notes
Wildcards are accepted in the
layers_to_freeze
parameter.
- unfreeze_layers(layers_to_unfreeze=None)
Unfreezes the specified layers in the neural network by setting
requires_grad
attribute to True for their parameters.- Parameters:
layers_to_unfreeze (list of str, optional) – List of names of the layers to unfreeze. If a layer name ends with an asterisk (
"*"
), then all parameters whose name contains the layer name (excluding the asterisk) are unfrozen. Otherwise, only the parameters with an exact match to the layer name are unfrozen. By default,[]
.- Returns:
The function only modifies the
requires_grad
attribute of the specified parameters and does not return anything.- Return type:
None
Notes
Wildcards are accepted in the
layers_to_unfreeze
parameter.
- only_keep_layers(only_keep_layers_list=None)
Only keep the specified layers (
only_keep_layers_list
) for gradient computation during the backpropagation.- Parameters:
only_keep_layers_list (list, optional) – List of layer names to keep. All other layers will have their gradient computation turned off. Default is
[]
.- Returns:
The function only modifies the
requires_grad
attribute of the specified parameters and does not return anything.- Return type:
None
- inference(set_name='infer', verbose=False, print_info_batch_freq=5)
Run inference on a specified dataset (
set_name
).- Parameters:
set_name (str, optional) – The name of the dataset to run inference on, by default
"infer"
.verbose (bool, optional) – Whether to print verbose outputs, by default False.
print_info_batch_freq (int, optional) – The frequency of printouts, by default
5
.
- Return type:
None
Notes
This method calls the
train()
method with thenum_epochs
set to1
and all the other parameters specified in the function arguments.
- train_component_summary()
Print a summary of the optimizer, loss function, and trainable model components.
Returns:
None
- Return type:
None
- train(phases=None, num_epochs=25, save_model_dir='models', verbose=False, tensorboard_path=None, tmp_file_save_freq=2, remove_after_load=True, print_info_batch_freq=5)
Train the model on the specified phases for a given number of epochs.
Wrapper function for
train_core()
method to capture exceptions (KeyboardInterrupt
is the only supported exception currently).- Parameters:
phases (list of str, optional) – The phases to run through during each training iteration. Default is
["train", "val"]
.num_epochs (int, optional) – The number of epochs to train the model for. Default is
25
.save_model_dir (str or None, optional) – The directory to save the model in. Default is
"models"
. If set toNone
, the model is not saved.verbose (int, optional) – Whether to print verbose outputs, by default
False
.tensorboard_path (str or None, optional) – The path to the directory to save TensorBoard logs in. If set to
None
, no TensorBoard logs are saved. Default isNone
.tmp_file_save_freq (int, optional) – The frequency (in epochs) to save a temporary file of the model. Default is
2
. If set to0
orNone
, no temporary file is saved.remove_after_load (bool, optional) – Whether to remove the temporary file after loading it. Default is
True
.print_info_batch_freq (int, optional) – The frequency (in batches) to print training information. Default is
5
. If set to0
orNone
, no training information is printed.
- Returns:
The function saves the model to the
save_model_dir
directory, and optionally to a temporary file. If interrupted with aKeyboardInterrupt
, the function tries to load the temporary file. If no temporary file is found, it continues without loading.- Return type:
None
Notes
Refer to the documentation of
train_core()
for more information.
- train_core(phases=None, num_epochs=25, save_model_dir='models', verbose=False, tensorboard_path=None, tmp_file_save_freq=2, print_info_batch_freq=5)
Trains/fine-tunes a classifier for the specified number of epochs on the given phases using the specified hyperparameters.
- Parameters:
phases (list of str, optional) – The phases to run through during each training iteration. Default is
["train", "val"]
.num_epochs (int, optional) – The number of epochs to train the model for. Default is
25
.save_model_dir (str or None, optional) – The directory to save the model in. Default is
"models"
. If set toNone
, the model is not saved.verbose (bool, optional) – Whether to print verbose outputs, by default
False
.tensorboard_path (str or None, optional) – The path to the directory to save TensorBoard logs in. If set to
None
, no TensorBoard logs are saved. Default isNone
.tmp_file_save_freq (int, optional) – The frequency (in epochs) to save a temporary file of the model. Default is
2
. If set to0
orNone
, no temporary file is saved.print_info_batch_freq (int, optional) – The frequency (in batches) to print training information. Default is
5
. If set to0
orNone
, no training information is printed.
- Raises:
ValueError –
If the loss function is not set. Use the
add_loss_fn()
method to set the loss function.If the optimizer is not set and the phase is “train”. Use the
initialize_optimizer()
oradd_optimizer()
method to set the optimizer.KeyError – If the specified phase cannot be found in the keys of the object’s
dataloaders
dictionary property.
- Return type:
None
- calculate_add_metrics(y_true, y_pred, y_score, phase, epoch=-1, tboard_writer=None)
Calculate and add metrics to the classifier’s metrics dictionary.
- Parameters:
y_true (array-like of shape (n_samples,)) – True binary labels or multiclass labels. Can be considered ground truth or (correct) target values.
y_pred (array-like of shape (n_samples,)) – Predicted binary labels or multiclass labels. The estimated targets as returned by a classifier.
y_score (array-like of shape (n_samples, n_classes)) – Predicted probabilities for each class. Only required when
y_pred
is not binary.phase (str) – Name of the current phase, typically
"train"
or"val"
. Seetrain
function.epoch (int, optional) – Current epoch number. Default is
-1
.tboard_writer (object, optional) – TensorBoard SummaryWriter object to write the metrics. Default is
None
.
- Return type:
None
Notes
This method uses both the
sklearn.metrics.precision_recall_fscore_support()
andsklearn.metrics.roc_auc_score()
functions fromscikit-learn
to calculate the metrics for each average type ("micro"
,"macro"
and"weighted"
). The results are then added to themetrics
dictionary. It also writes the metrics to the TensorBoard SummaryWriter, iftboard_writer
is not None.
- plot_metric(y_axis, y_label, legends, x_axis='epoch', x_label='epoch', colors=5 * ['k', 'tab:red'], styles=10 * ['-'], markers=10 * ['o'], figsize=(10, 5), plt_yrange=None, plt_xrange=None)
Plot the metrics of the classifier object.
- Parameters:
y_axis (list of str) – A list of metric names to be plotted on the y-axis.
y_label (str) – The label for the y-axis.
legends (list of str) – The legend labels for each metric.
x_axis (str, optional) – The metric to be used as the x-axis. Can be
"epoch"
(default) or any other metric name present in the dataset.x_label (str, optional) – The label for the x-axis. Defaults to
"epoch"
.colors (list of str, optional) – The colors to be used for the lines of each metric. It must be at least the same size as
y_axis
. Defaults to5 * ["k", "tab:red"]
.styles (list of str, optional) – The line styles to be used for the lines of each metric. It must be at least the same size as
y_axis
. Defaults to10 * ["-"]
.markers (list of str, optional) – The markers to be used for the lines of each metric. It must be at least the same size as
y_axis
. Defaults to10 * ["o"]
.figsize (tuple of int, optional) – The size of the figure in inches. Defaults to
(10, 5)
.plt_yrange (tuple of float, optional) – The range of values for the y-axis. Defaults to
None
.plt_xrange (tuple of float, optional) – The range of values for the x-axis. Defaults to
None
.
- Return type:
None
Notes
This function requires the
matplotlib
package.
- show_sample(set_name='train', batch_number=1, print_batch_info=True, figsize=(15, 10))
Displays a sample of training or validation data in a grid format with their corresponding class labels.
- Parameters:
set_name (str, optional) – Name of the dataset (
"train"
/"validation"
) to display the sample from, by default"train"
.batch_number (int, optional) – Which batch to display, by default
1
.print_batch_info (bool, optional) – Whether to print information about the batch size, by default
True
.figsize (tuple, optional) – Figure size (width, height) in inches, by default
(15, 10)
.
- Returns:
Displays the sample images with their corresponding class labels.
- Return type:
None
- Raises:
StopIteration – If the specified number of batches to display exceeds the total number of batches in the dataset.
Notes
This method uses the dataloader of the
ImageClassifierData
class and thetorchvision.utils.make_grid()
function to display the sample data in a grid format. It also calls the_imshow
method of theImageClassifierData
class to show the sample data.
- print_batch_info(set_name='train')
Print information about a dataset’s batches, samples, and batch-size.
- Parameters:
set_name (str, optional) – Name of the dataset to display batch information for (default is
"train"
).- Return type:
None
- show_inference_sample_results(label, num_samples=6, set_name='test', min_conf=None, max_conf=None, figsize=(15, 15))
Shows a sample of the results of the inference.
- Parameters:
label (str, optional) – The label for which to display results.
num_samples (int, optional) – The number of sample results to display. Defaults to
6
.set_name (str, optional) – The name of the dataset split to use for inference. Defaults to
"test"
.min_conf (float, optional) – The minimum confidence score for a sample result to be displayed. Samples with lower confidence scores will be skipped. Defaults to
None
.max_conf (float, optional) – The maximum confidence score for a sample result to be displayed. Samples with higher confidence scores will be skipped. Defaults to
None
.figsize (tuple[int, int], optional) – Figure size (width, height) in inches, displaying the sample results. Defaults to
(15, 15)
.
- Return type:
None
- save(save_path='default.obj', force=False)
Save the object to a file.
- Parameters:
save_path (str, optional) – The path to the file to write. If the file already exists and
force
is notTrue
, aFileExistsError
is raised. Defaults to"default.obj"
.force (bool, optional) – Whether to overwrite the file if it already exists. Defaults to
False
.
- Raises:
FileExistsError – If the file already exists and
force
is notTrue
.- Return type:
None
Notes
The object is saved in two parts. First, a serialized copy of the object’s dictionary is written to the specified file using the
joblib.dump
function. The object’smodel
attribute is excluded from this dictionary and saved separately using thetorch.save
function, with a filename derived from the originalsave_path
.
- save_predictions(set_name, save_path=None, delimiter=',')
- Parameters:
set_name (str)
save_path (str | None)
delimiter (str)
- load_dataset(dataset, set_name, batch_size=16, sampler=None, shuffle=False, num_workers=0, **kwargs)
Creates a DataLoader from a PatchDataset and adds it to the
dataloaders
dictionary.- Parameters:
dataset (PatchDataset) – The dataset to add
set_name (str) – The name to use for the dataset
batch_size (Optional[int], optional) – The batch size to use when creating the DataLoader, by default 16
sampler (Optional[Union[Sampler, None]], optional) – The sampler to use when creating the DataLoader, by default None
shuffle (Optional[bool], optional) – Whether to shuffle the PatchDataset, by default False
num_workers (Optional[int], optional) – The number of worker threads to use for loading data, by default 0.
- Return type:
None
- load(load_path, force_device=False)
This function loads the state of a class instance from a saved file using the joblib library. It also loads a PyTorch model from a separate file and maps it to the device used to load the class instance.
- Parameters:
load_path (str) – Path to the saved file to load.
force_device (bool or str, optional) – Whether to force the use of a specific device, or the name of the device to use. If set to
True
, the default device is used. Defaults toFalse
.
- Raises:
FileNotFoundError – If the specified file does not exist.
- Return type:
None
- cprint(type_info, bc_color, text)
Print colored text with additional information.
- Parameters:
type_info (str) – The type of message to display.
bc_color (str) – The color to use for the message text.
text (str) – The text to display.
- Returns:
The colored message is displayed on the standard output stream.
- Return type:
None
- update_progress(progress, text='', barLength=30)
Update the progress bar.
- Parameters:
progress (float or int) – The progress value to display, between
0
and1
. If an integer is provided, it will be converted to a float. If a value outside the range[0, 1]
is provided, it will be clamped to the nearest valid value.text (str, optional) – Additional text to display after the progress bar, defaults to
""
.barLength (int, optional) – The length of the progress bar in characters, defaults to
30
.
- Raises:
TypeError – If progress is not a floating point value or an integer.
- Returns:
The progress bar is displayed on the standard output stream.
- Return type:
None
- class mapreader.ContextPostProcessor(patch_df, labels_map, delimiter=',')
A class for post-processing predictions on patches using the surrounding context.
- Parameters:
patch_df (pd.DataFrame | geopandas.GeoDataFrame | str | pathlib.Path) – the DataFrame containing patches and predictions
labels_map (dict) – the dictionary mapping label indices to their labels. e.g. {0: “no”, 1: “railspace”}.
delimiter (str, optional) – The delimiter used in the CSV file, by default “,”.
- required_columns = ['parent_id', 'pixel_bounds', 'pred', 'predicted_label', 'conf']
- patch_df
- labels_map
- context
- get_context(labels)
Get the context of the patches with the specified labels.
- Parameters:
labels (str | list) – The label(s) to get context for.
- update_preds(remap, conf=0.7, inplace=False)
Update the predictions of the chosen patches based on their context.
- Parameters:
remap (dict) – A dictionary mapping the old labels to the new labels.
conf (float, optional) – Patches with confidence scores below this value will be relabelled, by default 0.7.
inplace (bool, optional) – Whether to relabel inplace or create new dataframe columns, by default False
- class mapreader.OcclusionAnalyzer(patch_df, model, transform='default', delimiter=',', device='default')
A class for carrying out occlusion analysis on patches.
- Parameters:
patch_df (pd.DataFrame | gpd.GeoDataFrame | str | pathlib.Path) – The DataFrame containing patches and predictions.
model (str or nn.Module) – The PyTorch model to add to the object. If a string, this should be the path to a model checkpoint.
transform (str or callable) – The transform to apply to the patches. Options of “default” or a torchvision transform. Default transform
delimiter (str) – The delimiter used in the patch_df csv file. By default, “,”.
device (str) – The device to use. By default, “default”.
- loss_fn = None
- add_loss_fn(loss_fn='cross entropy')
Add a loss function to the object.
- Parameters:
loss_fn (str or torch.nn.modules.loss._Loss) – The loss function to use. Can be a string or a torch.nn loss function. Accepted string values are “cross entropy” or “ce” (cross-entropy), “bce” (binary cross-entropy) and “mse” (mean squared error). By default, “cross entropy” is used.
- Return type:
None
- run_occlusion(label, sample_size=10, save=False, path_save='./occlusion_analysis/', block_size=14)
Run occlusion analysis on a sample of patches for a given label.
- Parameters:
label (str) – The label to run the analysis on.
sample_size (int) – The number of patches to run the analysis on. By default, 10.
save (bool) – Whether to save the occlusion analysis images. By default, False.
path_save (str) – The path to save the occlusion analysis images to. By default, “./occlusion_analysis/”.
block_size (int) – The size of the occlusion block. By default, 14.
- mapreader.print_version()
Print the current version of mapreader.