Let there be Light: The Future of Cloud Computing
Published:
This was a conversation piece co-authored by myself and my colleague Zacharaya Shabka in an attempt to outline the motivation behind making data centres all-optical for a broad audience.
Google search queries, Netflix video streams, Amazon purchases, and almost every other online task all communicate data between the end-user and a data centre. However, when a simple ‘packet’ of data (e.g. a string of Google search terms) arrives at a data centre, the data centre must often leverage the computing power of many servers to serve the request specified in the packet. For example, a short sentence typed into Google translate is a small amount of information which is communicated to the data centre. Upon its arrival, however, the data centre must leverage a large amount of compute (servers) to, in this example, apply a machine learning inference model to translate the sentence. In this example the inter-data centre (between the user and the data centre) traffic was small, but the intra-data centre (between servers in a single data centre) traffic was considerably larger; 90% of the 20.6 zettabytes of annual data centre traffic are being communicated inside data centres.
For this reason, the network connecting the many servers inside a data centre is as important as the servers themselves; data centres do not leverage such huge compute power purely as a function of how many computers they contain, but on how many computers they can leverage collectively through communicating data between them.
However, current data centre networks are unfit to meet the aforementioned increase in data centre usage as well as the new and often bandwidth-heavy ways in which data centres are increasingly being used (e.g. high bandwidth machine learning algorithm training with specialised hardware). Internal data centre networks propagate information as light in glass optical fibres, but they must convert the signal to electrons each time they reach a node on their way to their destination. These networks are therefore not all-optical and are referred to interchangeably as 'electronic networks'. Electronic networks have become prohibitive to data centre performance and cost for two related reasons. Firstly, the post Moore’s Law era (an ongoing phenomenon by which it becomes increasingly difficult to scale down electronics in a cost-effective way) poses a threat to the ability of the electronic network switches to up-scale to higher bandwidths. Electronic switches cannot support arbitrary bandwidths but are designed for a specific data rate. Faster bandwidth electronic switches require more energy to operate, however electronics have typically scaled such that the average cost-per-bit has still been able to decrease. This scalability trend is expected to diminish with the end of Moore’s Law, whereby increasing the bandwidth of electronic switches in the network is no longer financially feasible for data centre operators. Secondly, data centres consume roughly as much energy as the UK, emitting more CO2 emissions than the entire aviation industry. Since the internal data centre network accounts for 30% of this power consumption, it is desirable to have an energy-bandwidth scalable network, where the energetic cost of the network does not significantly increase relative to higher bandwidth support.
Unlike electronic networks, optical networks can support very large bandwidths in a highly scalable way, since the architecture of the network components is independent of the data rate. This means that if bandwidth demands increase, the same network infrastructure can be used. Furthermore, optical networks produce less heat, and therefore require less power-hungry cooling, and do not require the expensive signal-boosting that electronic signals do since the power loss of optical signals inside optic fibres is very low. Additionally, no conversion between optical and electronic signals is needed to interface the data centre network with external optical networks, thereby decreasing latency times and increasing data centre performance.
However, data centres must be able to reconfigure (i.e. create a path) in an amount of time significantly less than the length of the packets. Otherwise, if a signal needs to be sent from A to B, and then another from A to C, more time is spent waiting for the network to create a path from A to C after the first message has been sent from A to B than is spent transmitting the data. This is a significant network inefficiency. The ability of an optical network to reconfigure is typically limited to microsecond timescales. Although this seems fast, the average data packet inside a data centre lasts for an amount of time on the order of 10 nanoseconds, meaning current switching is roughly 1,000 times slower than the transmission time. This incurs unacceptable network latency, which limits optical data centre compute performance.
Legacy electronic data centre networks must become all-optical. Our research is concerned with designing optical network components and architectures that can reconfigure at the required timescales. This will enable the scalability and performance benefits of optical networks to be employed inside data centres, facilitating the next decade of technological advancement across many fields.