You are here:

Home
Detectors
Detector Insights

Detector Insights

Training Warnings

Localized training warnings are designed to offer you precise feedback on potential issues or mistakes in your dataset. Rather than sifting through tons of data to locate a small discrepancy, these warnings will directly point you to the problematic areas. This means:

Decreased debugging workload: The tedious task of debugging is made easier, allowing users to quickly refine their datasets.
Spot potential issues effortlessly: Especially in larger datasets, localized warnings shine a spotlight on areas needing attention, simplifying the troubleshooting process.

(Note that we had some warnings in the training report but these were not localized on your imagery.)

The warnings are accessible in the Detector Insights panel of Detector Training UI located in the navigation menu on the right as seen here:

There are various types of warnings that we can geolocate, encompassing all the following categories that we currently support:

Outlines overlapping in count mode

When your detector is in count mode your outlines should not overlap. In the example below if you are trying to detect buildings and have overlapping outlines, the model will automatically merge those outlines together during the training process and will “think” that you are trying to predict all the buildings that are adjacent as a single detection rather than as individual containers.

You should make sure to separate those outlines. Insetting your outlines slightly from the actual object boundary can help.

Annotations overlapping in count mode

Holes in anotations in count mode

In count mode the model is just focused on trying to get an accurate count of the number of your outlines. In the example below the holes will be in fact filled in during the training process. In other words there will be no difference whether or not you drew the holes in the first place and the model will learn to detect the full building without holes. Additionally in count mode, the model will never output a detection with holes. If you care more about the area coverage of an object or texture in your imagery you should be in segmentation mode, where outlines with holes are valid and will be used by the model properly.

Holes in count mode

Training areas and Accuracy areas overlapping

Here the user has drawn training and accuracy areas overlapping, which is a problem because you should not be assessing the accuracy of your detector on the same data that you trained it on. Your model will likely produce a very high score here and is not representative of how well it will perform across different regions not directly seen by the detector. Basically you’re letting the detector “cheat” on the score.

Imagine teaching a math student that 1+1 = 2, and then you give them a quiz on addition but you just ask them what is 1 + 1? That’s not a very good quiz if you aren’t diversifying the different problems you ask them to solve.

Training areas and Accuracy areas overlapping

Warnings about your accuracy area results

With these types of warnings, we compare the predictions over an accuracy area with your labeled ground truth and point out spots where the two don’t agree. Of course this process can be done manually as well by sorting your worst performing accuracy areas and just looking at the results, but the idea is to guide your review process if you have a large number of accuracy areas.

The different types of mistakes that will be marked in this case include:

instances where there is a prediction but no matching ground truth,
where there is a ground truth but no matching prediction,
where there is a partial overlap between the two but not enough to count as a match by the scoring algorithm,
when the two match but the detected classes are different.

Currently this only works in count mode, we hope to extend this to segmentation mode in the near future.

Here are some examples:

Detection does not match the annotations

Did the user forget to annotate something in the accuracy area here? Maybe there are similar objects in your training areas that were mistakenly outlined, or perhaps you need more examples of what isn’t your object of interest in your dataset. Or is it just an object that looks similar visually to what you trying to identify and the detector is having a tough time with it (you might want to check the confidence maps) image

Detection does not match the annotations

Annotation and detection shapes don't match.

Was the annotation itself poorly created and actually should be separated into individual outlines here? Or is it the detector itself, and are there annotations in the training set that shouldn’t be split up. It could also just be that this is a hard case with lots of objects overlapping and crisscrossing , making it hard for the detector to, literally in this case of a seal detector, determine head from tail.

Class does not match the detection

The class of this object could be mislabeled It could be a difficult to identify type of objects, and if there are enough of them in your dataset, may warrant having its own class Maybe there is some mislabelling of classes in your training dataset that need to be reviewed

Unlike warnings about your dataset you may not be able directly fix these warnings as they relate to the output of your detector. As before when there are problems with the results in your accuracy areas check your annotations, both in your training and accuracy areas, experiment with settings and classes, revisit your imagery resolution, etc.

Note!

Note that as with the dataset warnings, each time you train we only show a small subset of these warnings over the worst performing accuracy areas to not potentially flood your detector space with tons of warnings. The idea is to give you an idea of what is not performing well to guide your detector experimentation, not to point out every single error in your results.

Also important to note is that because the warnings are decided based on the output of the detector it is possible that the same warning will repeat itself on objects that are too difficult for the detector to get correct. However, since we randomize which subset of warnings we show, this hopefully shouldn’t happen too often.

Dataset Recommendations

What is dataset recommendation?

Dataset recommendation is a tool that helps you during the training phases of your detector. It generates markers similar to the markers you can create yourself. However, these auto-generated Recommendations markers denote regions where your dataset could use more coverage. Wherever there is a recommendation marker, you will want to consider adding a training/accuracy area centered around this marker.

How to use the recommendations

When you generate dataset recommendations, a number of markers will appear on your training images. These dataset recommendation markers help you identify regions in your images where you should add more training/accuracy areas.

Remember: you still have to draw the areas yourself as well as annotate all the objects, just like with any other training/accuracy area that you place yourself!

Types of recommendations:

Training recommendations

recommendation for adding a Training area

Accuracy recommendations

recommendation for adding an Accuracy area

Remember – Don’t make your accuracy areas excessively large, avoid redundant data, and try to capture variety.

You can resolve your marker once you have finished drawing relevant training or accuracy area and outlining your objects.

You can also easily navigate through your recommendation markers via the detector insights panel.

Every time you re-generate dataset recommendations, it will remove any existing automated recommendations to avoid flooding your workspace. It then generates a new set of recommendations based on the current state of your dataset. In short, the tool is meant to be used as part of your iterative workflow: by mixing your annotation strategy between reviewing testing/accuracy results as well as incorporating suggestions from dataset recommendation you can produce a well-performing, complete and robust detector!

Another example!

Here’s another example where we ran the tool on a tree-growing plantation and got the following recommended areas:

What’s quite remarkable about this is the variety between the recommended regions. You’ll notice that most of the areas don’t have the object of interest, which in this case is holes for planting trees. But that’s okay because these future “empty” training/accuracy areas will help teach the model what the holes do not look like, thus preventing false positives in your results. In fact, we ran our detector before adding more areas, then zoomed to the marker locations, and in most of these regions, we were getting many unexpected false positives (and several false negatives). Adding training areas solved most of these issues and increased the overall accuracy of our detector by a whopping 7%!

Dataset recommendation report

For a more detailed view of dataset recommendations, you can also have a look at the dataset recommendation report. The report is a way to visualize and understand your data better as you create your dataset by revealing visual patterns in your data. By doing so, you can have a better understanding of things like “Does my training dataset have enough variety of coverage?” and “Will my accuracy areas produce a score that is actually representative of the performance of my detector?”

In any machine learning workflow the first step is always to visualize the data properly. This could just mean staring at the imagery for a long time until you have some kind of base intuition about it (what it looks like, the diversity of content, etc). But this could take quite some time. The dataset recommendation report will help guide your visualization process so you can more quickly get a grasp of your data and understand potential weaknesses of and ways to improve your detector.

The key thing to remember here is that the output of the report, as in the case of the training report, is a better understanding of your dataset and imagery that will give you the intuition necessary to produce better detectors.

Why is it useful?

It can often be difficult to know how to improve your model. It’s easy enough to get started by annotating a bunch of examples of the object/pattern you’re trying to detect. But at some point, the question of where to add your next training area always arises. Geospatial imagery is large, and it’s hard to be able to go through all of that yourself and make decisions at such a large scale. That’s why we created the dataset recommendation features: to help our users make good decisions on how to improve their dataset.

How to use the report?

The dataset recommendation report can be accessed from the top bar next to the “Train Detector” button as shown below. It is located in the same popup as the Training Report.

You can generate a report for each detector. It will use all of the training images in your detector.

Report generation time varies depending on the amount of imagery in your training dataset. The report will process just the detection areas in your training images if they have been set, otherwise the entire image.

After hitting that generate button you will get a detailed report looking similarly to the below example:

In more detail, the report works by dividing your imagery into small tiles and then finding and grouping similar tiles together based on visual similarity. For example, if you have imagery that contains different geographical regions (like forest, water, urban, etc), it will be able to cluster together similar looking tiles. Note that this is not a classification algorithm, it will not explicitly label these regions, it just separates different looking zones from each other.

You can then visualize these tiles using the tile similarity view and clustering functionality in the interactive report and by doing so, understand what regions your current training dataset does and doesn’t cover properly. This will allow you to make decisions on how to improve your dataset. Both the left and right hand views are different ways of exploring and visualizing your data.

Types of areas detected >> tiles similarities view

This 2D visualization has been obtained by clustering your dataset into groups (“clusters”), computed based on similarities, differences and completeness of your dataset. You can change the number of clusters to see different groupings of your imagery (we limit number of clusters to 10 because it becomes hard to distinguish the different colors with more than that).The visualization also shows the coverage of your training and accuracy areas, which you can toggle on and off.

You can zoom in to view the individual tiles as well as click on them to zoom to the location of that tile in the imagery in the detailed “Training images clusters” view available next in the report.

Training images clusters & new area recommendation

This view gives you a detailed understanding of how your imagery is divided by region and overlays the recommendations markers that suggest candidate regions to add new training or accuracy areas in your next round of training. You can navigate through these recommendation markers using the arrows located at the top right.

As explained above these markers indicate regions where the system identifies low coverage in training or accuracy areas. They are dynamically generated with each new report, ensuring ongoing refinement and optimization of your training process.

Training images overview >> quick look into clustering across all training images

Here is a high level overview of how the clustering behaves across all the training images in your dataset. You can click on each image to see them in full resolution view on the left along with generated recommendation markers. Note that these images are scaled down for performance reasons, and also just because the idea of this overview is to simply give you a rough idea of the different regions, thus full resolution is not required. You can see the different regions here based on their color and understand how they are located in your imagery in a more natural way. If you adjust the number of clusters this view will update accordingly.

What about accuracy area coverage?

As explained in the accuracy area FAQ entry, if you only have a few accuracy areas there is little to no guarantee that this score is a good measure of the performance of your detector. Getting a representative score for your detector requires a good variety of accuracy areas just as having a good detector requires a good variety of training areas.

To ensure better accuracy area coverage, the same process can be applied to accuracy areas that was used for training areas. Look at the blue dots, look to see which regions are not covered by them, add accuracy areas as necessary in those regions to have adequate coverage.

Note that this is an evolving feature, we will likely update various elements of the UI and add additional functionality to it in the coming months. This documentation entry will be updated accordingly.