5 Conclusion and Future Work

Through this project we sought to explore the potential of RNA-Seq to contribute to the characterization of iPSC induced CMs and EHMs for clinical applications. This was achieved by firstly establishing a workflow to analyse the sequenced data and secondly by using a computational deconvolution technique to gain sub-population level knowledge from bulk data.

The results obtained from computational deconvolution was a proof of principle wherein this digital cytometry technique, which has not yet been employed in this particular context, validated the current protocol used in the production of CMs and EHMs by the group by showing that CM samples are mostly (~91%) composed of either committed cardiac cells or cardiomycytes, while EHMs possess about ~35% of non-myocyte population. Also, within this, the CMs in the pure CM samples are less mature than their counterparts found in EHMs. This is in-line with the view that CMs tend to mature in a 3D environment like that of an EHM. These results were based on an input with known and defined cell types and shows a supervised learning methodology. Using PCA, an unsupervised technique, we showed that the majority of the variation across different groups containing CMs — namely, adult heart, fetal heart, EHM and CM samples, as captured by PC1 does not explain the biological differences in the their CM populations, and the differences are possibly due to tissue complexity. We also showed that using this deconvolution technique and an easy workflow, could possibly be used to follow the efficiency and consistency of differentiation across different runs and batches of production. A key reason for the possibility of using this consistently is due to the cheaper costs of bulk sequencing compared with single cell sequencing. Here, using computational deconvolution it is possible to obtain the benefits of single cell sequencing (i.e., knowing the subtypes of populations within a sample in our use case) using just bulk sequencing, saving on costs and time. This could also complement the standard FACS based methods of tracking differentiation.

However, before this process becomes a regular part of quality control or characterization, limitations of the technique also need to be addressed, such as the fact that the results obtained by computational deconvolution is only as good as as the scRNA-Seq dataset. The potential granularity of information that can be obtained and its true revelance in deconvolution is mostly dependent on the scRNA-Seq reference dataset used. Thus, new and unknown cell-types can not be characterised or assessed using this technique, limiting its use in exploration or discovery. In this project, the scRNA-Seq reference used a protocol that is comparable and had the same end result, i.e., to produce CMs from iPSCs, yet, it is not exactly the same protocol used in-house. To make this really robust, comparable and of true value in monitoring the CMs or EHMs across different production runs, it would be prudent to produce a standarized in-house scRNA-Seq reference dataset.

Evaluation of microbial contamination of engineered tissue in the context of clinical use is extensive and thorough, usually by microbiological methods. A part of the thesis also explored the presence of potential microbial contaminants using sequenced data. Here the results were in-line with most others’ findings in this area and demonstrated the non-standard use of RNA-Seq data in microbial detection. Incidently, spike-ins were detected and confirmed as a testament to the method employed. However, with this generality of usage comes the limitation of specificity. It possibly can not be a standalone way of microbial estimation/detection, and could possibly hint towards extreme cases which warrant deeper, more focused exploration.

This was an exploratory analysis work and used data from varied sources none of which were aimed particularly for the questions asked in this project, for instance, the experimental design and the choices therein, sample sizes, replicates etc., were not all the same nor for the chosen question. Although measures were applied to avoid potential batch effects (samples that were sequenced across different time points, different instruments, different depths), it still remains a limitation to consider.

In summary, we demonstrated the potential of using computational deconvolution techniques to gain sub-population level information in bulk data and its possible role in aiding the refinement, quality control of the protocols to produce iPSC-induced CMs and EHMs.