Unsupervised segmentation of biomedical hyperspectral image data: tackling high dimensionality with convolutional autoencoders

Ciaran Bench; Jayakrupakar Nallala; Chun-Chin Wang; Hannah Sheridan; Nicholas Stone

doi:10.1364/BOE.476233

1. Introduction

Images depicting spatially resolved structural and compositional information about a sample tissue can assist pathologists in disease monitoring and diagnosis [1–4]. A given tissue component or region is generally differentiated from others by i) its constituent molecules and ii) the way in which these are typically distributed in space. This information is encoded in a Raman or infrared (IR) hyperspectral image (HSI) (an image that can be formed by collating Raman or IR spectra acquired at equally spaced out points across a sample), that reveals the presence/quantity of molecular bonds unique to specific molecules at each location [4,5]. These are sometimes also referred to as ‘Raman mappings’ depending on the method of acquisition.

Spectral k-means is a common strategy for segmenting tissue components from Raman/IR HSIs [6–10]. In essence, pixels are grouped together based on the similarity of their constituent spectra. Provided a suitable number of cluster groups are chosen, the resultant cluster map (an image formed by displaying the cluster group assigned to each spectrum in the HSI) reveals where particular molecules reside in the sample, and can be used to segment a tissue into regions defined by their distinct molecular contents. But as stated previously, a given tissue component or region may also be defined by its unique arrangement of molecules (i.e. how their constituent spectra are arranged in space). Spectral k-means is inadequate for segmenting tissue regions in this way as it does not use context of the spatial arrangements of neighbouring spectra (separate from the lack of spatial context, the quality of measured spectra may also affect the quality of the outputs [11]). Instead, a more optimal approach would involve i) devising a method for detecting how spectra are arranged in various regions of the HSI and ii) finding a way to cluster regions containing similar arrangements of spectra. Though, the most effective way to detect and manipulate the spatial and spectral features unique to each sample component remains an open question. This is particularly challenging given the high dimensionality of HSI datasets that demand large amounts of memory and long computation times to process.

Recently, deep learning approaches have emerged as promising tools for segmentating HSI data [12]. Compared to ‘classical’ spatio-spectral segmentation approaches such as phasor analysis, extended morphological profiles, and sparse representation models [2,3,12–20], networks learn optimal feature extraction automatically and are therefore more readily applicable to problems where it may be challenging to hand-engineer desirable features and/or consistently extract them from the data. Several architectures have been used for this task, such as recurrent neural networks [21], transformers [22] (both extract long and short range spectral features by treating HSI datacubes as sequences), graph convolutional networks (effective at capturing long range spatial dependencies) [23], and generative adversarial networks [24] (improving network generalisability with an adversarial training scheme). However, architectures based on convolutional networks appear to be the most commonly implemented. All of these have produced promising results on various HSI datsets in supervised or semi-supervised schemes [12,15,25–40]. However, in response to the usual lack of paired training data, effort has been placed towards developing fully unsupervised strategies (including for the separate task of spectral unmixing [41], and virtual staining [42,43]).

Most reported methods for performing segmentation without ground truths utilise conventional analysis approaches as opposed to deep networks [44–53]. However, one approach based on the use of fully convolutional autoencoders (CAEs) stands out in this respect [46,54,55]. Here, the HSI is decomposed into patches and a CAE is used to detect and compress information about spatial and spectral features found in each patch into low-dimensional latent vectors to enable their clustering in an end-to-end fashion (therefore, each stage of the data processing is specifically optimised for the target segmentation task) [54]. This technique belongs to a broader class of approaches that use supervisory signals from a secondary clustering step to enhance the quality of the learned latent representation via a unified learning objective [56–59]. CAEs (as opposed to other architectures such as stacked autoencoders [57,60–64], or more generic autoencoders [65,66]) are well-suited to processing HSI data as they can easily manipulate structured/high dimensional image data, are specialised to extract spatial features, and have a straightforward training procedure. Similar approaches to extracting features with CAEs have been used in [38,40,67–74], but the subsequent segmentation was not learned in a completely unsupervised setting in these cases. Although this approach has been shown to be effective at performing generic image classification tasks [56,75] (and has been applied to satellite HSI data [54], though only achieving relatively poor segmentation quality), there may be room to further optimise the architecture for processing biomedical HSI data. For Raman and IR HSIs, it is the location of peaks/spectral features along a spectrum that encodes information about molecular contents [76], and it is the spatial arrangement of different molecules (represented as unique spatial arrangements of peaks or other spectral features specifically found within a subset of each band in the HSI) that can be used to define a tissue type. Therefore, it is important to ensure that the chosen network is optimised to detect spatio-spectral features that only reside in specific regions of the spectral band. However, generic architectures may not be the most effective at achieving this functionality.

In [54], HSIs are treated as 3D images composed of single channels and processed with 3D convolutional layers [54]. However, 3D convolutional layers are not optimised to detect subtle but important features that may be unique to only a portion of the spectral band, instead they are predisposed to learn filters that detect features that may be present anywhere within the whole input volume. As a consequence, they may miss or be much less efficient at detecting subtle but relevant features that reside in specific portions of the spectral band. Networks composed of 2D convolutional layers are typically used to extract features from multi-channel 2D images [77]. However, these are also sub-optimal, as they are designed to detect features that span the whole spectral band, or simultaneously detect multiple features across the whole band [78,79]. In both cases, the learned representation of patch features may not encode information most relevant to segmentation, spurring a desire for a more efficient feature extraction framework. Figure 1 shows how the filters are applied in both cases, and more details about this problem are provided in Section 1 of Supplement 1.

Fig. 1. Schematic showing how 2D convolutional filters (red) are applied to multi-channel 2D images, and 3D convolutional filters (also red) are applied to single channel 3D images. 2D filters span the whole spectral band and are ‘scanned‘ across the sample in the x-y dimensions [78,79]. In contrast, 3D filters are typically smaller in the band dimension ‘b’, and take steps in all three dimensions. 2D filters are capable of detecting features that span the whole spectral band (or several features found across the band) but may lack sensitivity to subtle features that may occur only within a specific subset of the band. 3D filters also lack this sensitivity, as they are predisposed to detect more generic features that may occur anywhere within the HSI.

Download Full Size | PDF

In contrast to these architectures, the recently proposed UwU-net first processes the input HSI with a series of 2D convolutional layers, ultimately outputting fewer activation maps than the length of the input’s spectral band [15]. The resulting activation maps are then split along the channel dimension, where each is fed into its own ‘inner’ U-Net [80]. Their outputs are then concatenated and processed by another series of convolutional layers to produce the final output. With this architecture, the filters of each inner U-Net are specialised to detect the presence of features encoded in the activation map they are processing. This is unlike generic U-Nets where filters may learn to detect features encoded in all of the activation maps output by the first set of convolutional layers. Experiments have shown that UwU-net architectures outperform more generic U-Nets on supervised image-to-image regression tasks involving HSI data [15]. In these settings, it is clear that specialising feature extraction in this manner provides a clear advantage over generic 2D architectures. I.e. allowing the network to analyse certain activation maps in isolation at some stages (and also allowing it to learn the kinds of features that should be processed by each inner U-Net) appears to improve the network’s ability to detect useful features contained within HSI datasets. Therefore, it is of interest to observe whether this feature extraction framework may confer any advantages for unsupervised segmentation.

1.1 Outline of experiments

1.1.1 Synthetic fat/muscle HSIs

First, we demonstrate how unlike spectral k-means, this spatio-spectral approach enables the segmentation of tissue sections containing similar molecular contents by the way their constituent spectra are arranged in space. This is illustrated using synthetic Raman HSIs of porcine tissue divided into sections, each containing a distinct fat distribution. We compare an end-to-end spatio-spectral clustering strategy and the same approach not trained in an end-to-end manner with the results of spectral k-means.

1.1.2 Real colon HSIs

To assess our network’s ability to segment more complex samples (requiring the extraction of a much larger set of features), we applied it to real IR HSIs of colon tissue. We show that the resultant segmentations have good correspondence with the contents of HE stained adjacent tissue slices used as approximate ground truths. We also compare our results to those acquired with more generic architectures based on 2D and 3D convolutional layers to assess whether any particular strategy may fail to reproduce major morphological features known to be present in the sample.

2. Method: segmenting synthetic Raman HSIs

We applied our UwU-net inspired patch-based clustering approach to synthetic Raman HSIs of muscle tissue divided into sections, each containing a distinct fat distribution. We used two training strategies - an end-to-end training framework, and one that performed feature mining and clustering separately. Here we describe the preparation of the training data and details about the architecture used for training.

2.1 Data preparation

The synthetic HSIs depict generic porcine muscle tissue with fat distributions designed to mimic simple patterns that may be found in real tissue (though only approximately). The muscle and fat spectra used to construct the HSIs (shown in Fig. 2) were acquired from back bacon samples (with dimensions of approximately 3 cm$\times$3 cm$\times$3 mm) placed onto stainless steel slides (optical wavelength images of similar samples can be found in [81]). Each of these spectra are single measurements as opposed to averaged spectra. Prior to measurement, samples were stored at 4-6 $^{\circ }$C. Measurements were acquired with an InVia system (Renishaw, Wotton-under-Edge, UK) with an excitation wavelength of 830 nm, a power of 130 mW, a 50$\times$ long working objective, a 600 L/mm grating, and a 3s $\times$ 10 exposure time. The Raman spectrometer was calibrated using a Silicon peak at 520.5 cm$^{-1}$. Two point measurements were taken at two different positions on the same sample, one containing fat and one containing muscle. The spectra were not pre-processed with a baseline correction or smoothing.

Fig. 2. Left: An example $240\times 240$ pixel tissue mask used in the construction of a synthetic HSI. Three different fat patterns are embedded in the muscle - a solid section, a striped section, and a section containing a random distribution of fat ‘globules’. Right: Raman spectrum of porcine fat and muscle samples (back bacon) used to construct synthetic HSIs. These spectra were not pre-processed or normalised, and were acquired with an InVia imaging system (Renishaw, Wotton-under-Edge, UK) with an excitation wavelength of 830 nm, a power of 130 mW, a 50$\times$ long working objective, a 600 L/mm grating, and a 3s$\times$10 exposure time.

Download Full Size | PDF

Each HSI was constructed from a $240\times 240$ pixel (referred to as the $x$ and $y$ dimensions respectively) binary tissue mask created by assigning fat patterns to different predefined sections (an example is shown in Fig. 2). Each tissue model contained at least one section of solid fat, one section of striped fat, and one section with randomly distributed ‘globules’ of fat. Each fat pixel was assigned the same fat spectrum. Similarly, one muscle spectrum was assigned to each muscle pixel. 800 Raman shift values were used for each spectrum (ranging from 444.2 cm$^{-1}$ to 2035.2 cm$^{-1}$ in steps of 1.98 cm$^{-1}$). Therefore, each HSI had dimensions of $240\times 240\times 800$. It is evident that the synthetic tissue models only represent highly approximate versions of real tissue. A discussion of i) the effect this has on the robustness of this study and ii) the advantages/disadvantages of using other methods for generating synthetic images are provided in Section 2 of Supplement 1.

Once constructed, each HSI was then decomposed into 3136 $20\times 20\times 800$ patches, by taking steps of 4 pixels in a raster scan style fashion in the $x$ and $y$ dimensions. All negative values were zeroed. Patches were subsequently normalised by dividing them by the max pixel amplitude from the whole HSI. Whole synthetic HSIs are depicted in Fig. S1 in Supplement 1.

2.2 Network architecture outline

The architecture features two modules, an autoencoder module $A$ and a clustering module $C$ (see Fig. 3).

Fig. 3. UwU-net inspired segmentation network architecture. The CAE module (A) is used to mine features from the input HSI patch, and compress them into a 100 element latent vector. The architecture is inspired by the UwU-net that is hypothesised to provide a more effective feature extraction framework compared to more generic CAE architectures. The clustering module (C) outputs a probability that the patch belongs to each cluster. With the end-to-end training scheme (A) is initially trained in isolation. After this pretraining step, both (A) and (C) are trained together.

Download Full Size | PDF

2.2.1 Autoencoder module

Here, the autoencoder module is used to ‘mine’ features from the input patches that can be used for their subsequent clustering. A CAE learns to i) detect and compress spatio-spectral features contained in each image patch into a set of low-dimensional latent vectors, and ii) subsequently reconstruct the input from this compressed representation. The latent vectors encode information about the spatial and spectral features contained in the input. As a consequence, they can be used to group together patches containing similar contents.

More specifically, the autoencoder module $A$ consists of two components: i) an encoder $A_E$ that learns a mapping from an input HSI patch $x_i \in X$ to a set of low-dimensional latent vectors $l_{i,m} \in L$, $A_E: X \rightarrow L$ were $i$ indexes each patch, and $m$ indexes the latent vector produced by each CAE branch (explained in Section 1 of Supplement 1) and ii) a decoder $A_D$ that learns a mapping from the input patch’s set of latent vectors $l_{i,m}$ to a reconstruction of the original input $\hat {x_i} \in \hat {X}$, $A_D: L \rightarrow \hat {X}$. Further details about the structure of $A$ and the motivation for its design are discussed in Section 1 of Supplement 1.

2.2.2 CAE+k-means segmentation

Once trained, the set of latent vectors associated with each input image patch will encode information that enables the decoder to perform the reconstruction. This should consist of information about the spatial and spectral features contained in each patch. Regions of the HSI that share similar spatial and spectral features can be segmented by grouping them with patches that have similar latent vectors. This can be performed by i) concatenating the latent vectors produced from each patch into a single 1D vector ($\overline {l_i}$), then ii) clustering them with the corresponding $\overline {l_i}$ produced from all other patches (e.g. with k-means), and then iii) performing a reconstruction step to acquire the resultant segmentation image. Taking inspiration from [75], we refer to this style of segmentation as ‘CAE+k-means’ clustering (see Section 3.A of Supplement 1). However, with this approach, the CAE does not compress input patches in a way that is optimised for their subsequent clustering, instead, solely prioritising optimal reconstruction quality.

2.2.3 Clustering module and end-to-end clustering

To improve the quality of cluster assignments, the architecture also contains a clustering module $C$ that learns a mapping $C: L \rightarrow O$, where $o_{i,j} \in O$ is a set of ‘soft-assignments’, each describing the confidence that the patch (indexed with $i$) should be assigned to each of the clusters (indexed with $j$, where the total number of clusters is a user-defined parameter). The clustering loss is used with the reconstruction loss to update the parameters of $A$ and $C$ to ensure that features are extracted and compressed in a way that optimises the ‘accuracy’ of their subsequent cluster assignment performed by $C$. In $C$, the set of latent vectors associated with each input patch are concatenated into a 1D vector $\overline {l_i}$ that is then processed by a clustering layer, where the Student’s t-distribution is used to compute $o_{i,j}$ for each $\overline {l_i}$. We call this style of clustering ‘end-to-end clustering’. Details about how the clustering loss is calculated and utilised in the training process are described in Section 3.B of Supplement 1.

2.3 Further details

Further details about the execution of each training scheme are provided in Section 3 of Supplement 1.

3. Method: segmenting real IR HSIs

3.1 Preparation of real IR HSIs of colon tissue

We applied end-to-end clustering (using the framework described in Section 3.B of Supplement 1) on a set of three colon HSIs from the Minerva dataset [82]. Each image has a corresponding HE slide of an adjacent tissue slice. The three images shown here were chosen based on two criteria: i) they had a large number of glands present (therefore allow us to assess the network’s ability to segment a large set of components), and ii) their corresponding HE slide shows good correspondence with tissue contents as determined with spectral k-means (i.e. their HE slide can be used as an approximate ground truth). In order to enable the most effective assessment of segmentation quality as possible, samples with the highest quality ground truths were selected for presentation in the article.

Colon slices contain several tissue types and components. The most prominent being the intestinal glands, which themselves are composed of several subcomponents: the lumen, epithelial cells, stroma, and nuclei [83]. Given the pixel size of the images ($5.5\times 5.5$ $\mu m^{2}$), it is not clear whether these subcomponents may be resolvable. At the very least, we aim to segment the areas occupied by glands, as well as other neighbouring tissue types that can be clearly observed in the corresponding HE slides.

The HE slide corresponding to each HSI depicts the morphology of an adjacent tissue slice. Therefore, the presence and shape of some components may not exactly reflect those found in the tissue sample depicted by the HSI. With that said, the contents are expected to be similar, e.g. glands should remain clustered in similar regions. Though in some cases, the shapes and number of glands can differ significantly. Furthermore, the HE staining does not reveal all regions that may have different molecular contents, instead highlighting nuclei and the extracellular matrix/cytoplasm to reveal differences in morphology [84]. Ultimately, the HE slides allow us to evaluate whether any architecture fails to reproduce major morphological features of the samples (i.e. whether they generally appear in their expected locations and with similar shapes). Though, it is evident that these slides can not be used to precisely evaluate the quality of the segmentations produced with any of the chosen architectures.

The images were acquired with an Agilent 620 FTIR microscope coupled with an Agilent 670 FTIR spectrometer with a Globar light source, and a liquid-nitrogen cooled 128 $\times$ 128 FPA detector. The resultant images have a $5.5 \times 5.5 \mu m^2$ pixel size with a $704 \times 704 \mu m^2$ field of view. Measurements were acquired in the mid-IR spectral range of 1000–3800 cm$^{-1}$ at a spectral resolution of 4 cm$^{-1}$. Samples were ‘electronically de-paraffinised’ using a modified extended multiplicative signal correction. The spectral band was truncated to 1000-1800 cm$^{-1}$, and the spectra were interpolated to ensure each point was separated by 1 cm$^{-1}$. Further information about sample preparation, image preprocessing, and instrument details can be found in [85].

Images were prepared by taking $10\times 10$ patches in steps of 2 pixels using the same procedure described in Section 2.1. The whole images are depicted in Fig. S2 of Supplement 1.

3.2 Learning unsupervised segmentation with three different CAE-based architectures

Three different architectures were applied to the IR HSIs (all sharing the same two-component structure of the architecture shown in Fig. 3, and trained in an end-to-end manner): one where $C$ was a generic 3D CAE (Fig. S3 in Section 5 of Supplement 1), one where $C$ was a generic 2D CAE (Fig. S4), and a UwU-net inspired architecture (Fig. S5). An early stopping strategy was used to determine the stopping point of the CAE pretraining step for the UwU-net and the generic 2D CAE. Each network was trained with Adam as the optimiser, a batch size of 58, and a learning rate of 0.001. Once pretrained, both modules were trained together until less than 0.1% of assignments changed after a given epoch, or after 50 epochs.

The generic 3D CAE architecture was pretrained for 40 epochs, and the maximum number of epochs for the combined training step was set to 20 to limit the total training time to approximately one day. In some cases, the training of the generic 2D and 3D CAEs had to be initiated a few times before the expected exponential decay of the validation loss was observed.

In an attempt to give both styles of 2D networks similar expressive power, the generic 2D CAE network was constructed to utilise a similar number of filters and the same size latent vector as the UwU-net inspired architecture. This was done to ensure any observed differences in their performance could be attributed to differences in the splitting/branching of the layers. The 3D network was coded to have a similar structure as the 2D networks, but featured fewer layers and filters to help reduce computational time. It is not clear how much this may have affected the expressive power of this network relative to the 2D networks, as the architecture is manipulating information about features detected with a 3D convolutional filter as opposed to those detected with a multi-channel 2D filter (i.e. perhaps fewer 3D filters are needed to achieve similar expressive power as the 2D networks). Therefore, it is important to bear in mind that the comparison shown here is not necessarily between two networks with similar expressive power relative to their dimensionality. Nevertheless, the comparison still has some value in the sense that it shows the results that can be produced with either architecture given a clinically relevant time window of 1 day for training (the time between image acquisition and analysis of morphological information should ideally be as short as possible). These network architectures are shown in Section 5 of Supplement 1, while the CAE pre-training loss curves are shown in Section 6.

4. Results

4.1 Segmentation of synthetic Raman HSIs

As expected (see Fig. 4), spectral k-means was able to accurately differentiate muscle and fat pixels into two distinct clusters (a trivial task, given the properties of each image). The third cluster group is hardly utilised which is expected given i) the algorithm groups pixels by spectral features alone, and ii) there are only two types of spectra in the images (fat and muscle). It is evident that this approach can not be used to segment each tissue section as it does not utilise information about their constituent spatial features. These figures also show the results of our segmentation algorithms (CAE+k-means and end-to-end clustering) that utilise both spatial and spectral features. These were successful in distinguishing each tissue section. A normalised mutual information (NMI) score and adjusted Rand score (ARS) were used to compare the accuracy of the CAE+k-means and end-to-end clustering approaches (as can be seen in Table 1). For Samples 1 and 3, the end-to-end clustering scores were notably higher than the CAE+k-means, and for the remaining sample, the accuracy of both techniques were comparable. The benefits of the end-to-end approach are most clearly observed in Sample 1 in Fig. 4. Here, several regions within the striped section of the tissue are erroneously assigned to the same class as the globule section of the tissue using the CAE+k-means approach. The number of erroneous cluster assignments is significantly reduced in the end-to-end segmentation output. All segmentations are offset from the ground truth by a distance related to the tiling step size that lowers the segmentation quality scores. This is discussed further in Section 4.2.

Fig. 4. Segmentation results and ground truth for each synthetic HSI. All images shown here have been assigned random colour scales.

Download Full Size | PDF

Fig. 5. Segmentations of IR HSIs of real human colon samples produced with end-to-end clustering using various architectures. The HE stained adjacent tissue slices used as approximate ground truths are also depicted. All three architectures produced comparable results, and appear to be able to segment the locations of most glands. Though, a precise evaluation of segmentation accuracy is not possible given the approximate nature of the ground truth.

Download Full Size | PDF

Table 1. NMI and ARS scores for the segmentations produced for each synthetic HSI.

View Table

For all samples, patches found near the boundary between two tissue regions were clustered into the same group containing fat globule patches (i.e. their latent vectors were most similar to those representing globule patches). The solid and striped sections have distinct features that are fairly consistent across all patches, whereas there is greater variability and randomness in the features found in globule patches. Therefore it is not entirely surprising that patches with an assortment of different fat patterns adjacent to one another would be grouped into the ‘globule’ class. The extent of this decreased slightly with the end-to-end clustering approach. A smaller patch size might help reduce the spatial extent of these artefacts, though this should be done with caution as each tissue region is defined by spatial patterns that span a certain range, and the spatial context represented in each patch should ideally span this scale.

It is important to note that these results do not indicate how successful this approach may be in more practical settings as our model tissues represent idealised samples for several reasons. These are described in depth in Section 2 of Supplement 1. Furthermore, the few samples shown here do not represent the full range of possible tissue types. Therefore the extent of our approach’s performance on more tissue types remains somewhat unclear. Nonetheless, this experiment demonstrates our UwU-net inspired architecture’s capability to utilise both spatial and spectral information to segment tissue regions primarily differentiated by the way their constituent molecules are arranged in space, and that training it in an end-to-end matter can improve the accuracy of the resultant segmentation for some examples.

There are two other limitations that are important to note. The first is that this algorithm may only be used to segment regions whose defining features span the patch size. Though patches of various dimensions can be easily accommodated by altering the network architecture. Secondly, the accuracy is dependent on the initialisation of cluster centres performed with k-means. If there is significant class imbalance (i.e. one tissue class occupies a smaller area than the others) then this could lead to errors in the resultant segmentation [86].

4.2 Segmentation of real IR colon HSIs

All three CAE architectures produced segmentations qualitatively comparable to the corresponding HE stains (Fig. 5). Interestingly, all contained prominent artefacts at the interface between the tissue sample and background - different cluster groups appear to layer on top of each other to form the boundary. We hypothesise that unique latent vectors are required to reconstruct patches containing varying proportions of background and tissue, resulting in the border being defined by a large number of cluster groups and consequently the production of these ‘layer-like’ artefacts. Despite the presence of these artefacts, the border regions appear to maintain the same morphology as that depicted in the HE stain - the boundaries remain smooth, and the shape of the sample remains easy to assess. Though, the various border classes do not appear to correspond with distinct morphological features, and therefore we advise to disregard them when assessing tissue contents.

Given the HE slides depict an adjacent tissue slice and thus only provide approximate information about the morphology of the sample’s components, a precise assessment of segmentation quality (or comparisons between all three architectures) is not possible. Though, it is expected that the morphology of the adjacent slice should share similar structural characteristics as the sample depicted in the HSI (e.g. glands gathered in similar locations). Therefore, the HE stains allow us to assess whether any particular architecture may completely fail to segment any major morphological features expected to be present in the sample. More specifically, a broad measure of segmentation quality may be assessed by whether they contain most of the components found in their corresponding HE slide, and whether these components reside in similar locations.

Each architecture was capable of producing segmentations that satisfy these criteria, demonstrating each has at least some basic ability to segment the larger components present in the samples, and showing the robustness of this CAE-based approach for segmenting real biomedical HSI data. In addition to gland segmentation, the algorithms appeared to differentiate other tissue sections. However, these are not shown in the resultant HE stains, making it challenging to discern whether i) there really are unique tissues present, and ii) whether their boundaries have been accurately segmented. Smaller scale morphological features (e.g. the lumen) may be elucidated by the use of a larger number of clusters as can be seen in Section 7 of Supplement 1. Though without a precise ground truth, it is not possible to evaluate how accurately these objects may have been segmented, or whether they are truly present in each location. Therefore, we focus our attention on large-scale morphological features instead.

The segmentations appear distorted at the bottom and right edges. This is a consequence of how segmentation images are reconstructed (by overlapping tiles from left to right and a top to bottom fashion), and the size of each image patch. A patch placed in the last column will only be overlapped by the patch placed below it, while a patch placed in the final row will only be overlapped by the patch to its right. This produces block-like/smeared artefacts with thicknesses equal to the patch size at the boundaries that can be easily cropped away. Aside from these artefacts, the reconstruction procedure may also subtly translate the position of objects by an amount that appears to be related to the chosen step size. This was observed in the Indian Pines dataset experiment (Section 8 in Supplement 1), and may be easily corrected with additional cropping. This artefact also affected the segmentations produced from the synthetic HSIs mentioned in the previous section, and contributed to lowering the segmentation quality scores.

4.2.1 Using additional cluster groups

It may be possible to elucidate smaller scale morphological features from the colon images by using a larger number of clusters groups. E.g. it appears as though the lumen can be discerned from colon Sample 1 with the use of 40 cluster groups with the UwU-net inspired architecture (see Fig. S10 in Section 7 of Supplement 1). Though without a precise ground truth, it is not possible to assess how accurately these objects may have been segmented. The segmentations produced using additional cluster groups for the other two architecture types are shown as well.

5. Conclusion

We have investigated the use of CAEs to perform unsupervised segmentation of Raman/IR HSIs by grouping regions with similar spatial and spectral features. The architectures contain two components: an autoencoder module that mines features from HSI patches and encodes them in low dimensional latent vectors, and a clustering module that then groups similar latent vectors together (akin to grouping together image patches based on the similarity of their constituent features). All of this can be performed/learned in an end-to-end fashion, ensuring the feature mining and compression are optimised for the subsequent clustering step. With a simulation study, we have shown that this approach is particularly useful in cases where different regions of a tissue contain similar molecular contents but may be differentiated by the way their constituent spectra are arranged in space. We also showed how the end-to-end nature of the architecture can improve the quality of the segmentations compared to a strategy that performs the feature mining and clustering in completely separate steps.

To assess whether these approaches can produce segmentations in the face of a much larger feature set that would normally be encountered in real HSI data (as well as experimental artefacts), we tested three autoencoder architectures on real IR HSI data: a generic 2D convolutional autoencoder, a generic 3D convolutional autoencoder, and a 2D architecture inspired by the recently proposed UwU-net. All segmentations were comparable and had good correspondence with HE stains used as approximate ground truths, indicating that this approach can cope with the diverse set of features found in real biomedical HSI data and other confounding experimental factors. Despite the hypothesised advantages of the UwU-net inspired architecture, no significant qualitative differences were observed when compared to the segmentations produced with the generic 2D or 3D CAEs. However, a precise/quantitative comparison between each architecture was not possible given the approximate nature of the ground truth HE stains, which depict an adjacent tissue slice and therefore do not provide information about the exact morphology/boundaries of each relevant component within the HSI data. Therefore, the true extent of the benefits any particular architecture may provide is uncertain. Additionally, the number of samples used in this study is small, and does not represent the full range of different features/sample types that may be encountered in practice (e.g. no tumoral samples were used in this study). Therefore, there remains some degree of uncertainty as to the quality we could expect from using this approach on a wider selection of samples.

It may be possible to use alternative datasets with accurate ground truths to compare architecture performance, such as satellite images of geographical landscapes. However, there are two potential issues with this. The first is that it is not clear whether features residing in specific portions of the spectral bands are as relevant as they are in biomedical HSI data. Therefore, this dataset may not allow us to observe the expected benefit from using any particular architecture for segmenting biomedical HSI data. Secondly, the accuracy of the approach depends on the accuracy of the initialised k-means cluster centres. Poor accuracy has been reported on the Indian Pines dataset and others in [54] and confirmed separately in our own experiment (see Section 8 in Supplement 1). Therefore, it is likely that these kinds of datasets are a poor choice for comparing the performance of our architectures.

Another broader limitation of using this approach is that it is most effective when segmenting regions whose defining features span the patch size. Therefore, it may not be suitable for segmenting tissue regions whose characteristic spatial distribution of molecules spans large distances. Secondly, as mentioned above, the accuracy of the initialised cluster centres strongly determines the quality of the output segmentation. Therefore, this approach may be less effective in cases where there is an imbalance in the representation of tissue sections (i.e. one tissue class occupies a significantly smaller area than the others) [86]. Furthermore, the segmentations suffer from subtle smearing/translation artefacts. Though, these appear to be straightforward to correct by cropping the image using prior knowledge of the patch size and step size.

As it currently stands, the results obtained with each architecture are the first to demonstrate the robustness of the the CAE-driven saptio-spectral clustering approach to segment major tissue components from biomedical HSIs (or arguably any kind of experimentally acquired HSI dataset). Nevertheless, in future work we aim to more precisely quantify the advantages of each architecture by applying them to data with accurate ground truth segmentations, such as more realistic simulated phantoms, or other biomedical HSI datasets. One unexplored feature is the use of the uncertainty encoded in the soft cluster assignments to display information about segmentation quality. This could enable a more rigorous comparison of the effectiveness of different segmentation architectures, or the assessment of how the quality of segmentations produced by different samples may vary from region to region.

Funding

Engineering and Physical Sciences Research Council (UKRI EPSRC grant EP/V047914/1).

Disclosures

The authors report no conflicts of interest in this work.

Data availability

Some of the code used to generate the results shown here can be found in Ref. [87]. Much of this is modified code from [75] (clustering module/end-to-end framework) and [15] (UwU-net inspired architecture). The real human colon IR HSIs acquired as part of the Minerva project [82] are not currently available to the public.

Supplemental document

See Supplement 1 for supporting content.

References

1. T. Alexandrov and P. Lasch, “Segmentation of confocal raman microspectroscopic imaging data using edge-preserving denoising and clustering,” Anal. Chem. 85(12), 5676–5683 (2013). [CrossRef]

2. G. Lu and B. Fei, “Medical hyperspectral imaging: a review,” J. Biomed. Opt. 19(1), 010901 (2014). [CrossRef]

3. A. ul Rehman and S. A. Qureshi, “A review of the medical hyperspectral imaging systems and unmixing algorithms’ in biological tissues,” Photodiagn. Photodyn. Ther. 33, 102165 (2021). [CrossRef]

4. M. Diem, A. Mazur, K. Lenau, J. Schubert, B. Bird, M. Miljković, C. Krafft, and J. Popp, “Molecular pathology via ir and raman spectral imaging,” J. Biophotonics 6(11-12), 855–886 (2013). [CrossRef]

5. C. Scotté, H. B. de Aguiar, D. Marguet, E. M. Green, P. Bouzy, S. Vergnole, C. P. Winlove, N. Stone, and H. Rigneault, “Assessment of compressive raman versus hyperspectral raman for microcalcification chemical imaging,” Anal. Chem. 90(12), 7197–7203 (2018). [CrossRef]

6. Y. Khouj, J. Dawson, J. Coad, and L. Vona-Davis, “Hyperspectral imaging and k-means classification for histologic evaluation of ductal carcinoma in situ,” Front. Oncol. 8, 17 (2018). [CrossRef]

7. M. Hedegaard, C. Krafft, H. J. Ditzel, L. E. Johansen, S. Hassing, and J. Popp, “Discriminating isogenic cancer cells and identifying altered unsaturated fatty acid content as associated with metastasis status, using k-means clustering and partial least squares-discriminant analysis of raman maps,” Anal. Chem. 82(7), 2797–2802 (2010). [CrossRef]

8. M. Hedegaard, C. Matthäus, S. Hassing, C. Krafft, M. Diem, and J. Popp, “Spectral unmixing and clustering algorithms for assessment of single cells by raman microscopic imaging,” Theor. Chem. Acc. 130(4-6), 1249–1260 (2011). [CrossRef]

9. C. Krafft, M. A. Diderhoshan, P. Recknagel, M. Miljkovic, M. Bauer, and J. Popp, “Crisp and soft multivariate methods visualize individual cell nuclei in raman images of liver tissue sections,” Vib. Spectrosc. 55(1), 90–100 (2011). [CrossRef]

10. S. Piqueras, C. Krafft, C. Beleites, K. Egodage, F. Von Eggeling, O. Guntinas-Lichius, J. Popp, R. Tauler, and A. De Juan, “Combining multiset resolution and segmentation for hyperspectral image analysis of biological tissues,” Anal. Chim. Acta 881, 24–36 (2015). [CrossRef]

11. L. Lauwerends, H. Abbasi, T. Bakker Schut, P. Van Driel, J. Hardillo, I. Santos, E. Barroso, S. Koljenović, A. Vahrmeijer, R. Baatenburg de Jong, G. Puppels, and S. Keereweer, “The complementary value of intraoperative fluorescence imaging and raman spectroscopy for cancer surgery: combining the incompatibles,” Eur. J. Nucl. Med. Mol. Imaging 49(7), 2364–2376 (2022). [CrossRef]

12. S. Li, W. Song, L. Fang, Y. Chen, P. Ghamisi, and J. A. Benediktsson, “Deep learning for hyperspectral image classification: An overview,” IEEE Trans. on Geosci. Remote. Sens. 57(9), 6690–6709 (2019). [CrossRef]

13. D. Fu and X. S. Xie, “Reliable cell segmentation based on spectral phasor analysis of hyperspectral stimulated raman scattering imaging data,” Anal. Chem. 86(9), 4115–4119 (2014). [CrossRef]

14. W. J. Tipping, L. T. Wilson, C. An, A. A. Leventi, A. W. Wark, C. Wetherill, N. C. Tomkinson, K. Faulds, and D. Graham, “Stimulated raman scattering microscopy with spectral phasor analysis: Applications in assessing drug–cell interactions,” Chem. Sci. 13(12), 3468–3476 (2022). [CrossRef]

15. B. Manifold, S. Men, R. Hu, and D. Fu, “A versatile deep learning architecture for classification and label-free prediction of hyperspectral images,” Nat. Machine Intelligence 3(4), 306–315 (2021). [CrossRef]

16. Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image classification using dictionary-based sparse representation,” IEEE Trans. Geosci. Remote Sensing 49(10), 3973–3985 (2011). [CrossRef]

17. L. Fang, S. Li, X. Kang, and J. A. Benediktsson, “Spectral–spatial hyperspectral image classification via multiscale adaptive sparse representation,” IEEE Trans. Geosci. Remote Sensing 52(12), 7738–7749 (2014). [CrossRef]

18. G. Camps-Valls, L. Gomez-Chova, J. Mu noz-Marí, J. Vila-Francés, and J. Calpe-Maravilla, “Composite kernels for hyperspectral image classification,” IEEE Geosci. Remote Sensing Lett. 3(1), 93–97 (2006). [CrossRef]

19. N. Gorretta, G. Rabatel, C. Fiorio, C. Lelong, and J.-M. Roger, “An iterative hyperspectral image segmentation method using a cross analysis of spectral and spatial information,” Chemom. Intell. Lab. Syst. 117, 213–223 (2012). [CrossRef]

20. B. Baassou, M. He, S. Mei, and Y. Zhang, “Unsupervised hyperspectral image classification algorithm by integrating spatial-spectral information,” in 2012 International Conference on Audio, Language and Image Processing, (IEEE, 2012), pp. 610–615.

21. R. Hang, Q. Liu, D. Hong, and P. Ghamisi, “Cascaded recurrent neural networks for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sensing 57(8), 5384–5394 (2019). [CrossRef]

22. D. Hong, Z. Han, J. Yao, L. Gao, B. Zhang, A. Plaza, and J. Chanussot, “Spectralformer: Rethinking hyperspectral image classification with transformers,” IEEE Trans. Geosci. Remote Sensing 60, 1–15 (2022). [CrossRef]

23. D. Hong, L. Gao, J. Yao, B. Zhang, A. Plaza, and J. Chanussot, “Graph convolutional networks for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sensing 59(7), 5966–5978 (2021). [CrossRef]

24. L. Zhu, Y. Chen, P. Ghamisi, and J. A. Benediktsson, “Generative adversarial networks for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sensing 56(9), 5046–5063 (2018). [CrossRef]

25. H. Lee and H. Kwon, “Going deeper with contextual cnn for hyperspectral image classification,” IEEE Trans. on Image Process. 26(10), 4843–4855 (2017). [CrossRef]

26. M. E. Paoletti, J. M. Haut, R. Fernandez-Beltran, J. Plaza, A. Plaza, J. Li, and F. Pla, “Capsule networks for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sensing 57(4), 2145–2160 (2019). [CrossRef]

27. J. Li, X. Zhao, Y. Li, Q. Du, B. Xi, and J. Hu, “Classification of hyperspectral imagery using a new fully convolutional neural network,” IEEE Geosci. Remote Sensing Lett. 15(2), 292–296 (2018). [CrossRef]

28. Z. He, H. Liu, Y. Wang, and J. Hu, “Generative adversarial networks-based semi-supervised learning for hyperspectral image classification,” Remote Sens. 9(10), 1042 (2017). [CrossRef]

29. H. Wu and S. Prasad, “Semi-supervised deep learning using pseudo labels for hyperspectral image classification,” IEEE Trans. on Image Process. 27(3), 1259–1270 (2018). [CrossRef]

30. T. Li, J. Zhang, and Y. Zhang, “Classification of hyperspectral image based on deep belief networks,” in 2014 IEEE international conference on image processing (ICIP), (IEEE, 2014), pp. 5132–5136.

31. M. Midhun, S. R. Nair, V. N. Prabhakar, and S. S. Kumar, “Deep model for classification of hyperspectral image using restricted boltzmann machine,” in Proceedings of the 2014 international conference on interdisciplinary advances in applied computing, (2014), pp. 1–7.

32. Y. Chen, X. Zhao, and X. Jia, “Spectral–spatial classification of hyperspectral data based on deep belief network,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 8(6), 2381–2392 (2015). [CrossRef]

33. B. Liu, X. Yu, P. Zhang, X. Tan, A. Yu, and Z. Xue, “A semi-supervised convolutional neural network for hyperspectral image classification,” Remote Sens. Lett. 8(9), 839–848 (2017). [CrossRef]

34. X. Kang, B. Zhuo, and P. Duan, “Semi-supervised deep learning for hyperspectral image classification,” Remote Sens. Lett. 10(4), 353–362 (2019). [CrossRef]

35. F. F. Shahraki, L. Saadatifard, S. Berisha, M. Lotfollahi, D. Mayerich, and S. Prasad, “Deep learning for hyperspectral image analysis, part ii: Applications to remote sensing and biomedicine,” in Hyperspectral Image Analysis, (Springer, 2020), pp. 69–115.

36. B. Fang, Y. Li, H. Zhang, and J. C.-W. Chan, “Hyperspectral images classification based on dense convolutional networks with spectral-wise attention mechanism,” Remote Sens. 11(2), 159 (2019). [CrossRef]

37. N. Wambugu, Y. Chen, Z. Xiao, K. Tan, M. Wei, X. Liu, and J. Li, “Hyperspectral image classification on insufficient-sample and feature learning using deep neural networks: A review,” Int. J. Appl. Earth Obs. Geoinformation 105, 102603 (2021). [CrossRef]

38. S. Mei, J. Ji, Y. Geng, Z. Zhang, X. Li, and Q. Du, “Unsupervised spatial–spectral feature learning by 3d convolutional autoencoder for hyperspectral classification,” IEEE Trans. Geosci. Remote Sensing 57(9), 6808–6820 (2019). [CrossRef]

39. Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, “Deep feature extraction and classification of hyperspectral images based on convolutional neural networks,” IEEE Trans. Geosci. Remote Sensing 54(10), 6232–6251 (2016). [CrossRef]

40. Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning-based classification of hyperspectral data,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 7(6), 2094–2107 (2014). [CrossRef]

41. R. W. Schmidt, S. Woutersen, and F. Ariese, “Ramanlight—a graphical user-friendly tool for pre-processing and unmixing hyperspectral raman spectroscopy images,” J. Opt. 24(6), 064011 (2022). [CrossRef]

42. S. Soltani, A. Ojaghi, H. Qiao, N. Kaza, X. Li, Q. Dai, A. O. Osunkoya, and F. E. Robles, “Prostate cancer histopathology using label-free multispectral deep-uv microscopy quantifies phenotypes of tumor aggressiveness and enables multiple diagnostic virtual stains,” Sci. Rep. 12(1), 9329 (2022). [CrossRef]

43. X. Li, G. Zhang, H. Qiao, F. Bao, Y. Deng, J. Wu, Y. He, J. Yun, X. Lin, H. Xie, H. Wang, and Q. Hai, “Unsupervised content-preserving transformation for optical microscopy,” Light: Sci. Appl. 10(1), 44 (2021). [CrossRef]

44. M. P. Barbato, P. Napoletano, F. Piccoli, and R. Schettini, “Unsupervised segmentation of hyperspectral remote sensing images with superpixels,” arXiv preprint arXiv:2204.12296 (2022).

45. A. Schclar and A. Averbuch, “A diffusion approach to unsupervised segmentation of hyper-spectral images,” in International Joint Conference on Computational Intelligence, (Springer, 2017), pp. 163–178.

46. P. Ajay, B. Nagaraj, R. A. Kumar, R. Huang, and P. Ananthi, “Unsupervised hyperspectral microscopic image segmentation using deep embedded clustering algorithm,” Scanning 2022, 1–9 (2022). [CrossRef]

47. Y. Zhao, Y. Yuan, and Q. Wang, “Fast spectral clustering for unsupervised hyperspectral image classification,” Remote Sens. 11(4), 399 (2019). [CrossRef]

48. C. Wang, M. Gong, M. Zhang, and Y. Chan, “Unsupervised hyperspectral image band selection via column subset selection,” IEEE Geosci. Remote Sensing Lett. 12(7), 1411–1415 (2015). [CrossRef]

49. A. Mughees, X. Chen, and L. Tao, “Unsupervised hyperspectral image segmentation: Merging spectral and spatial information in boundary adjustment,” in 2016 55th annual conference of the society of instrument and control engineers of Japan (SICE), (IEEE, 2016), pp. 1466–1471.

50. J. Ye, T. Wittman, X. Bresson, and S. Osher, “Segmentation for hyperspectral images with priors,” in International Symposium on Visual Computing, (Springer, 2010), pp. 97–106.

51. F. Li, M. K. Ng, R. Plemmons, S. Prasad, and Q. Zhang, “Hyperspectral image segmentation, deblurring, and spectral analysis for material identification,” in Visual Information Processing XIX, vol. 7701 (SPIE, 2010), pp. 21–32.

52. N. Gillis, D. Kuang, and H. Park, “Hierarchical clustering of hyperspectral images using rank-two nonnegative matrix factorization,” IEEE Trans. Geosci. Remote Sensing 53(4), 2066–2078 (2015). [CrossRef]

53. J. M. Murphy and M. Maggioni, “Unsupervised clustering and active learning of hyperspectral images with nonlinear diffusion,” IEEE Trans. Geosci. Remote Sensing 57(3), 1829–1845 (2019). [CrossRef]

54. J. Nalepa, M. Myller, Y. Imai, K.-i. Honda, T. Takeda, and M. Antoniak, “Unsupervised segmentation of hyperspectral images using 3-d convolutional autoencoders,” IEEE Geosci. Remote Sensing Lett. 17(11), 1948–1952 (2020). [CrossRef]

55. A. Obeid, I. M. Elfadel, and N. Werghi, “Unsupervised land-cover segmentation using accelerated balanced deep embedded clustering,” IEEE Geosci. Remote Sensing Lett. 19, 1–5 (2022). [CrossRef]

56. F. Li, H. Qiao, and B. Zhang, “Discriminatively boosted image clustering with fully convolutional auto-encoders,” Pattern Recognit. 83, 161–173 (2018). [CrossRef]

57. J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,” in International Conference on Machine Learning, (PMLR, 2016), pp. 478–487.

58. J. Yang, D. Parikh, and D. Batra, “Joint unsupervised learning of deep representations and image clusters,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), pp. 5147–5156.

59. H. Liu, M. Shao, S. Li, and Y. Fu, “Infinite ensemble clustering,” Data Min. Knowl. Discov. 32(2), 385–416 (2018). [CrossRef]

60. G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science 313(5786), 504–507 (2006). [CrossRef]

61. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, and L. Bottou, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” Journal of Machine Learning Research 11, 3371–3408 (2010).

62. H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, “Exploring strategies for training deep neural networks,” Journal of Machine Learning Research 10, 1–40 (2009).

63. J. Masci, U. Meier, D. Cireşan, and J. Schmidhuber, “Stacked convolutional auto-encoders for hierarchical feature extraction,” in International Conference on Artificial Neural Networks (Springer, 2011), pp. 52–59.

64. H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Unsupervised learning of hierarchical representations with convolutional deep belief networks,” Commun. ACM 54(10), 95–103 (2011). [CrossRef]

65. C. Song, F. Liu, Y. Huang, L. Wang, and T. Tan, “Auto-encoder based data clustering,” in Iberoamerican Congress on Pattern Recognition (Springer, 2013), pp. 117–124.

66. P. Huang, Y. Huang, W. Wang, and L. Wang, “Deep embedding network for clustering,” in 2014 22nd International Conference on Pattern Recognition, (IEEE, 2014), pp. 1532–1537.

67. P. Abdolghader, A. Ridsdale, T. Grammatikopoulos, G. Resch, F. Légaré, A. Stolow, A. F. Pegoraro, and I. Tamblyn, “Unsupervised hyperspectral stimulated raman microscopy image enhancement: denoising and segmentation via one-shot deep learning,” Opt. Express 29(21), 34205–34219 (2021). [CrossRef]

68. L. Mou, P. Ghamisi, and X. X. Zhu, “Unsupervised spectral–spatial feature learning via deep residual conv–deconv network for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sensing 56(1), 391–406 (2018). [CrossRef]

69. X. Zhou and S. Prasad, “Advances in deep learning for hyperspectral image analysis—addressing challenges arising in practical imaging scenarios,” in Hyperspectral Image Analysis, (Springer, 2020), pp. 117–140.

70. C. Tao, H. Pan, Y. Li, and Z. Zou, “Unsupervised spectral–spatial feature learning with stacked sparse autoencoder for hyperspectral imagery classification,” IEEE Geosci. Remote Sensing Lett. 12(12), 2438–2442 (2015). [CrossRef]

71. X. Ma, H. Wang, and J. Geng, “Spectral–spatial classification of hyperspectral image based on deep auto-encoder,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 9(9), 4073–4085 (2016). [CrossRef]

72. R. Kemker and C. Kanan, “Self-taught feature learning for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sensing 55(5), 2693–2705 (2017). [CrossRef]

73. J. Ji, S. Mei, J. Hou, X. Li, and Q. Du, “Learning sensor-specific features for hyperspectral images via 3-dimensional convolutional autoencoder,” in 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), (IEEE, 2017), pp. 1820–1823.

74. X. Han, Y. Zhong, and L. Zhang, “Spatial-spectral unsupervised convolutional sparse auto-encoder classifier for hyperspectral imagery,” Photogrammetric Engineering & Remote Sensing 83(3), 195–206 (2017). [CrossRef]

75. X. Guo, X. Liu, E. Zhu, and J. Yin, “Deep clustering with convolutional autoencoders,” International Conference on Neural Information Processing 10635, 373–382(2017). [CrossRef]

76. D. W. Shipp, F. Sinjab, and I. Notingher, “Raman spectroscopy: techniques and applications in the life sciences,” Adv. Opt. Photonics 9(2), 315–428 (2017). [CrossRef]

77. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

78. Y. Chong, L. Chen, and S. Pan, “End-to-end joint spectral–spatial compression and reconstruction of hyperspectral images using a 3d convolutional autoencoder,” J. Electron. Imag. 30(04), 041403 (2021). [CrossRef]

79. S. M. Vibhu, “A survey of accelerator architectures for 3d convolution neural networks,” J. Syst. Archit. 115, 102041 (2021). [CrossRef]

80. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

81. S. Mosca, P. Dey, T. A. Tabish, F. Palombo, N. Stone, and P. Matousek, “Determination of inclusion depth in ex vivo animal tissues using surface enhanced deep raman spectroscopy,” J. Biophotonics 13(1), e201960092 (2020). [CrossRef]

82. V. Naranjo, F. Penaranda, M. Alcaniz, B. Napier, M. Farries, G. Stevens, J. Ward, C. Barta, R. Hasal, A. Seddon, S. Sujecki, S. Lamrini, U. Moller, O. Bang, P. Moselund, M. Abdalla, D. D. Gaspari, H. M. R. Vinella, G. Lloyd, N. Stone, J. Nallala, J. Schnekenburger, L. L. Kastl, and B. Kemper, “Minerva project, mid-to near infrared spectroscopy for improved medical diagnostics,” European Project Space on Intelligent Systems, Pattern Recognition and Biomedical Systems53 (2015). [CrossRef]

83. S. Rathore, M. A. Iftikhar, A. Chaddad, T. Niazi, T. Karasic, and M. Bilello, “Segmentation and grade prediction of colon cancer digital pathology images across multiple institutions,” Cancers 11(11), 1700 (2019). [CrossRef]

84. J. K. Chan, “The wonderful colors of the hematoxylin–eosin stain in diagnostic surgical pathology,” Int. J. Surg. Pathol. 22(1), 12–32 (2014). [CrossRef]

85. J. Nallala, G. R. Lloyd, N. Shepherd, and N. Stone, “High-resolution FTIR imaging of colon tissues for elucidation of individual cellular and histopathological features,” Analyst 141(2), 630–639 (2016). [CrossRef]

86. P. Fränti and S. Sieranoja, “K-means properties on six clustering benchmark datasets,” Appl. Intell. 48(12), 4743–4759 (2018). [CrossRef]

87. C. Bench, “Unsupervised segmentation of biomedical hyperspectral image data: tackling high dimensionality with convolutional autoencoders,” Github, 2022, https://github.com/ciaranbench/unsupervised-HSI-seg.

Example #	NMI (CAE+k-means)	NMI (End-to-end)	ARS (CAE+k-means)	ARS (End-to-end)
1	0.59	0.68	0.66	0.76
2	0.80	0.81	0.85	0.86
3	0.66	0.71	0.72	0.78

Unsupervised segmentation of biomedical hyperspectral image data: tackling high dimensionality with convolutional autoencoders

Abstract

1. Introduction

1.1 Outline of experiments

1.1.1 Synthetic fat/muscle HSIs

1.1.2 Real colon HSIs

2. Method: segmenting synthetic Raman HSIs

2.1 Data preparation

2.2 Network architecture outline

2.2.1 Autoencoder module

2.2.2 CAE+k-means segmentation

2.2.3 Clustering module and end-to-end clustering

2.3 Further details

3. Method: segmenting real IR HSIs

3.1 Preparation of real IR HSIs of colon tissue

3.2 Learning unsupervised segmentation with three different CAE-based architectures

4. Results

4.1 Segmentation of synthetic Raman HSIs

4.2 Segmentation of real IR colon HSIs

4.2.1 Using additional cluster groups

5. Conclusion

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (5)

Tables (1)

Biomedical Optics Express