However, using this approach is prone to errors, hard to introduce to devices that rely solely on internal sensors and can significantly increase the manufacturing price of the devices.
SLAM in Endoscopy Applications
In the past few years there has been a breakthrough in the field of 3D localization using inputs as simple as a video stream. A family of algorithms called simultaneous localization and mapping (SLAM) are able, in real time, to create a 3D map of a scene captured by a camera and calculate with very high accuracy the location of the camera in the scene.
The algorithm workflow is as follows: first it finds the key interest points in the images received from a video stream. Those key points could be calculated on specific parts of the image with unique features using methods like SIFT or ORB. Then for each new frame the algorithm tries to match each of the new key points to the already detected keypoints. When enough key points are found and matched the algorithm tries to solve the following optimization problem, what arrangement of cameras and key points locations in a 3D space would allow each matched keypoint to appear in the correct location in all the frames it was seen in. Once enough key points get added to the the 3D model, some variations of the algorithm can complete the cloud of keypoints into a 3D mesh depicting with high accuracy the scene the video feed is showing. After the scene is initialized, we can use the key points from new frames to triangulate the location of the new frame in the scene.
That being said, there are variations of the SLAM algorithm. Some try to place each pixel in the 3D model as opposed to only the key points. Other variations use a pre-processed image and not the original one i.e a black and white version of the image, a single channel – in endoscopy usually the red one because you can see the veins more clearly or any pre-process that would assist the feature detector to find better key points.
There are multiple benefits for using these algorithms in endoscopy. For example, one can use the map created by the algorithms together with the location of the device to help surgeons orient themselves in the patient’s body and even assist with navigation. The 3D model can by itself be very useful even without using the localization, a use case might be measuring dimensions of any object that is filmed from enough angles.
Why SLAM in Endoscopy
Most of the SLAM algorithms that are being researched target outdoor scenes which allow to rely heavily on high variation in depth. Another key feature of outdoor scenes is the rigidity of the objects presented in them. Those two traits are lacking in images depicting endoscopic surgeries. To overcome these obstacles, we have tailor-made custom SLAM algorithms that use state of the art deep learning models that extracts additional information on the image and prior knowledge on the real-life topography of the human body.
Adding information about the real depth of the scene and what the topography of the scene really is allow for the optimization problem to converge into a more accurate and reliable 3D model. These algorithms are able to create a precise 3D model of the endoscope environment and calculate the exact location of the endoscope in that model.
Using these types of algorithms allow for both an easy introduction of location tracking in devices that use only internal sensors, or allow for error detection and correction in systems that use external sensors for localization in order to improve accuracy and add additional information about the device surrounding.