CloudFactory launches Accelerated Annotation after acquiring
15.08.2021 — Julia Gong

Using Computer Vision to Identify Critical Anatomy in Surgical Images

Find out how the Stanford Medical AI and Computer Vision Lab (MARVL) used Hasty to create the largest dataset containing images taken following thyroidectomy.

Using Computer Vision to Identify Critical Anatomy in Surgical Images

About a quarter of our users here at are part of the research community. We reached out to them to learn more about their research. Some of the stories were so great that we asked the researchers to write a summary that we can share on our blog. 

Today, we're excited to share Julia Gong's work. She has been a Hasty user since the early days and is studying at Stanford where she is part of the Stanford Medical AI and Computer Vision Lab (MARVL). MARVL develops artificial intelligence and machine learning algorithms to enable new capabilities in biomedicine and healthcare.

You can find Julia on Twitter, LinkedIn, or the WWW.

Why thyroidectomy?

Standardized imaging methods such as CT scans, MRIs, dermoscopic images, retinal scans, and whole-slide pathology images have enabled large-scale deep learning-powered image analysis in their respective fields. However, the implementation of computer vision methods lags behind in surgery. Standardized imaging methods are not currently integrated into the surgical workflow (especially not in open surgery), but there are a lot of opportunities for computer vision to assist the operating surgeon, which we explore in our work.

So, why the thyroid? Thyroidectomy is one of the most common operations performed in the United States - with about 150,000 cases annually in the United States alone [1]. Though frequently performed, thyroidectomy requires that surgeons navigate a complex microanatomy to preserve speech and swallowing and to ensure the health of nearby parathyroid glands. For instance, the human recurrent laryngeal nerve (RLN), located just under the thyroid gland, innervates the vocal cords and helps us speak, breathe and eat safely. Given this complex anatomy, we ask the questions: Can we develop computer vision tools to assist surgeons during thyroidectomy to identify the RLN? And how do operating room image capture conditions affect such a method's performance?

What's already been done?

Early work has examined developing computer vision methods for anatomical identification (among other tasks) in endoscopic, laparoscopic, and robot-assisted surgery. While endoscopy is primarily diagnostic in nature, laparoscopic and robot-assisted surgery have a higher degree of complexity and variability, with surgical activity and instruments moving free-form in constrained fields of view. Taking it a step further, open surgery has even greater scene complexity and variability, and thus very few groups have examined anatomical identification for open surgery. Our work thus not only seeks to develop a deep learning-enabled computer vision system to identify and measure critical soft tissue during thyroidectomy but also aims to lay the foundation for future translational work bringing AI to vision and discrimination for open surgery. Please see our paper [2] for further literature and details.

How we used data to find the recurrent laryngeal nerve (RLN)

To train a computer vision model to accurately segment the RLN, we first needed to collect image data containing the RLN. We used retrospectively acquired, de-identified images taken during thyroidectomy and obtained the necessary ethical approval (see our paper for details). In total, we collected 277 color (RGB) photographs from 130 patients, which contained a diverse array of procedure types and perspectives on surgical anatomy. Due to the high diversity of images in our dataset, we also labeled images for two image conditions: lighting and distance to surgical anatomy. Analyzing our segmentation results using these image meta-tags helped us to answer our second research question: How do operating room image capture conditions affect our segmentation method's performance?

1 b1T5VpC1pw6P2CrRN3DvDw
Data annotation using Note that this image is the same as Supplementary Figure S3 in the original paper, which is published by Scientific Reports.

To obtain the ground-truth RLN segmentations for these images, each image was carefully and manually annotated by surgeons and reviewed by a senior surgeon. We used the platform to collect these annotations from our clinical collaborators; in particular, they used the polygon and brush tools to create detailed annotations that remained faithful to the borders of the nerve tissue. Surgeons also created segmentations for retractors, which we used in the second stage of our pipeline - nerve width estimation. We also used the bounding box tool to annotate bounding boxes around the wound region for a subset of 136 images, which enabled us to train our cropping model. The cropping model was used to crop input images to the wound region prior to nerve segmentation, which reduced the clutter in images that were not centered on the area of interest. To export the data in a format suitable for model training, it only took a few clicks on the platform.

This collection of images taken following thyroidectomy (along with their annotations) are, to our knowledge, the largest such dataset to date. We are excited to be pushing the frontiers of surgical vision by presenting this dataset along with our end-to-end method.

1 2nsSjRvtbFbBXMCwgSeh g

Illustration of our dataset, which is diverse across both brightness and picture distance image conditions. Note that this image is the same as Figure 2 in the original paper, which is published by Scientific Reports.


Open surgery is a challenging environment for computer vision algorithms. In this work, we investigate anatomical identification during thyroidectomy, one common type of open surgery. Our end-to-end recurrent laryngeal nerve (RLN) segmentation and measurement method demonstrates the potential of using computer vision algorithms to augment intraoperative decision-making. Please see our paper for details on our methods, results, and analysis, which we hope will spur on further research in integrating AI methods into open surgery and surgical workflows.

Shameless plug time

Only 13% of vision AI projects make it to production. With Hasty, we boost that number to 100%.
Our comprehensive vision AI platform is the only one you need to go from raw data to a production-ready model. We can help you with:

All the data and models you create always belong to you and can be exported and used outside of Hasty at any given time entirely for free.

You can try Hasty by signing up for free here. If you are looking for additional services like help with ML engineering, we also offer that. Check out our service offerings here to learn more about how we can help.

Keep reading

Making best-in-class data labeling cost-effective

With our new Accelerated Annotation offering, we can offer cost-effective AI without compromising on quality or speed.