Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Let there be Light: The Future of Cloud Computing

6 minute read

Published:

This was a conversation piece co-authored by myself and my colleague Zacharaya Shabka in an attempt to outline the motivation behind making data centres all-optical for a broad audience.

projects

3D Holographic AR and VR Displays

Published:

Modern augmented and virtual reality systems such as Oculus VR, Magic Leap and HoloLens products are fundamentally unfit for purpose because they use 2D projection which leads to eye fatigue and a lack of immersive experience for the user. Holography is the only way to produce truly 3D images, and could therefore emerge as the leading technology for such display systems. Historically to have large field-of-view holographic displays, engineers had to use either smaller spatial light modulators or telescope de-magnification techniques. This resulted in uncomfortably small and often impractical display sizes. In collaboration with the University of Cambridge and holographic display company VividQ, this project saw the development of a system capable of expanding the eyebox without compromising on display size, signficantly improving the usability and quality of 3D holographic displays.

Huawei DriveML Challenge

Published:

Autonomous driving holds the promise of reducing traffic accidents by designing safe, robust, accurate and intelligent agents. In this introductory-level competition, Huawei open-accessed their traffic perception simulation system with which competitors had 6 weeks to design, train and test a single-agent navigating roads and traffic. My team attempted to use the NeuroEvolution of Augmenting Topologies (NEAT) genetic algorithm for the agent policy trained with a reinforcement learning framework, however performance was ultimately not good due to poor generalisation to unseen scenarios.

Network Attack Detection

Published:

Software to detect network intrusions protects a computer network from unauthorised users, including perhaps insiders. The KDD dataset from the 1999 DARPA Intrusion Detection Evaluation Program competition contains roughly 5 million network connection request fingerprints split into 4 broad categories (DoS, R2L, U2R and probing attacks) which can be further sub-divided into 23 forms of attack. Using a standard sequential neural network with 3 hidden layers, a model was trained with a supervised learning framework in a client-server architecture to detect malicious network requests with 99.99% accuracy.

Smart Ski Boot

Published:

Using embedded systems hardware can be more complex to programme, but can bring benefits in terms of costs and power consumption. This project saw the development of a prototype ‘smart ski boot’ that could be used for more accurate, in-depth and cheap ski technique instruction than is deliverable by expensive human instructors, who typically charge £500-700 a day. This stands to benefit not only skiers who will save money in tuition fees and receive superior teaching, but also the ski industry, from restauranteurs to equipment providers, whose customer base will increase as fewer people are priced out of the sport.

Resource Management in Distributed Deep Learning Optical Clusters (Ongoing)

Published:

(Paper One) (GitHub) Low-latency, high-bandwidth, ultra-scalable optical circuit switched networks can address the limitations of current compute clusters and enable the deployment of next-generation high-performance clusters and data centres. In particular, machine learning workloads present a unique opportunity for which to develop specialised circuit-based clusters because they are predictable, periodic, and consist mostly of large network flows. Furthermore, trillion-parameter learning models are being developed with final test performances capped primarily by the model size; a characteristic which, in the ‘strong scaling’ case, is limited by the bandwidth of the network connecting the cluster’s servers. In this work, we aim to address the challenge of how to make resource management decisions (from computation graph partitioning and placement to server allocation and scheduling) when training massive models on an optical cluster with distributed deep learning. By framing the problem as a Markov decision process where sequential actions must be taken to maximise some reward (such as minimising the overall job completion time), a graph neural network can be trained from scratch with end-to-end reinforcement learning to allocate the cluster’s resources near-optimally. We are in the process of developing a suite of cluster environments, graph neural network models, and reinforcement learning algorithms in order to achieve this, and we hope to demonstrate both good performance and the ability to scale to large networks and jobs.

Ultra-Fast Optical Switch Optimisation

Published:

(Paper One) (Paper Two) (Paper Three) (GitHub) (Documentation) One of the primary bottlenecks to all-optical data centre networks is the lack of a packet-timescale switch. This project saw the application of AI techniques to switch semiconductor optical amplifiers in just half a nanosecond. AI beat the previous world-record by an order of magnitude and, for the first time, offered the potential to be scaled to thousands of switches in a real data centre.

Reinforcement Learning for Combinatorial Optimisation

Published:

(Paper One) (Paper Two) (GitHub) Optimisation problems are search problems where a solution which maximises some objective is being sought amongst a search space. Combinatorial optimisation (CO) is an optimisation sub-category where the solution being sought is a discrete variable (e.g. an integer, a graph, a set, etc.) amongst a finite (or countably infinite) space of possible solutions. Many real-world problems fall under the broad category of CO, from network routing and scheduling to protein folding and fundamental science. However, with many CO problems being NP-hard, solving non-trivial instance sizes in reasonable time frames is a significant challenge. Although CO solvers were studied and designed extensively in the latter half of the 20th, recent years have seen a resurgance in their academic study with the application of machine learning to solving CO problems. This work saw the application of graph neural networks and reinforcement learning to learn to solve graph-based combinatorial optimisation problems from scratch. This was done through the design of two new machine learning algorithms. The first achieved state-of-the-art scalability for learned heuristic solutions, and the second enabled the integration of reinforcement learning into exact branch-and-bound solvers. These are important steps towards establishing machine learning as the go-to approach for solving CO problems, which will unlock advances in a plethora of real-world applications.

Network Traffic Generation Tool

Published:

(Paper One) (Paper Two) (Paper Three) (GitHub) (Documentation) Data related to communication networks is often sensitve and proprietary. Consequently, many networking academic papers are published without open-accessing the network traffic data that was used to obtain the results, and when they are published the datasets are often too limited for data-hungry applications such as reinforcement learning. In an effort to aid reproducibility, some authors release characteristic distributions which broadly describe the underlying data. However, these distributions are often not analytically described and may not fall under the classic ‘named’ distributions (Gaussian, log-normal, Pareto etc.). As a result, other researchers find themselves using unrealistically simple uniform traffic distributions or their own distributions which are difficult to universally benchmark. This project saw the development of an open-access network traffic generation tool for (1) standardising the traffic patterns used to benchmark networking systems, and (2) enabling rapid and easy replication of literature distributions even in the absence of raw open-access data.

publications

Optimal Control of SOAs with Artifical Intelligence for Sub-Nanosecond Optical Switching

Published in Journal of Lightwave Technology (JLT), 2020

Novel approaches to switching ultra-fast semiconductor optical amplifiers using artificial intelligence algorithms (particle swarm optimisation, ant colony optimisation, and a genetic algorithm) are developed and applied both in simulation and experiment. Effective off-on switching (settling) times of 542 ps are demonstrated with just 4.8% overshoot, achieving an order of magnitude improvement over previous attempts described in the literature and standard dampening techniques from control theory.

Recommended citation: C. W. F. Parsonson, Z. Shabka, W. K. Chlupka, B. Goh and G. Zervas, "Optimal Control of SOAs with Artificial Intelligence for Sub-Nanosecond Optical Switching", Journal of Lightwave Technology (JLT), 2020 https://ieeexplore.ieee.org/abstract/document/9124678

Benchmarking Packet-Granular OCS Network Scheduling for Data Center Traffic Traces

Published in Photonic Networks and Devices, 2021

We recently reported hardware-implemented scheduling processors for packet-granular reconfigurable optical circuit-switched networks. Here, we benchmark the performance of the processors under various data center traffic for a range of network loads.

Recommended citation: Joshua L. Benjamin, Christopher W. F. Parsonson, and G. Zervas "Benchmarking Packet-Granular OCS Network Scheduling for Data Center Traffic Traces", Photonic Networks and Devices, 2021 https://opg.optica.org/abstract.cfm?uri=Networks-2021-NeW3B.3

AI-Optimised Tuneable Sources for Bandwidth-Scalable, Sub-Nanosecond Wavelength Switching

Published in Optics Express, 2021

Wavelength routed optical switching promises low power and latency networking for data centres, but requires a wideband wavelength tuneable source (WTS) capable of sub-nanosecond switching at every node. We propose a hybrid WTS that uses time-interleaved tuneable lasers, each gated by a semiconductor optical amplifier, where the performance of each device is optimised using artificial intelligence. Through simulation and experiment we demonstrate record wavelength switch times below 900 ps across 6.05 THz (122×50 GHz) of continuously tuneable optical bandwidth. A method for further bandwidth scaling is evaluated and compared to alternative designs.

Recommended citation: T. Gerard, C. W. F. Parsonson, Z. Shabka, B. Thomsen, P. Bayvel, D. Lavery and G. Zervas, "AI-Optimised Tuneable Sources for Bandwidth-Scalable, Sub-Nanosecond Wavelength Swithching", Optics Express, 2021 https://opg.optica.org/oe/fulltext.cfm?uri=oe-29-7-11221&id=449558

Optimal and Low Complexity Control of SOA-Based Optical Switching with Particle Swarm Optimisation

Published in European Conference and Exhibition on Optical Communication (ECOC), 2022

We propose a reliable, low-complexity particle swarm optimisation (PSO) approach to control semiconductor optical amplifier (SOA)-based switches. We experimentally demonstrate less than 610 ps off-on switching (settling) time and less than 2.2% overshoot with 20x lower sampling rate and 8x reduced DAC resolution.

Recommended citation: H. Alkharsan, C. W. F. Parsonson, Z. Shabka, X. Mu, A. Ottino and G. Zervas, "Optimal and Low Complexity Control of SOA-Based Optical Switching with Particle Swarm Optimisation", ECOC-22: European Conference and Exhibition on Optical Communication, 2022 https://opg.optica.org/abstract.cfm?uri=ECEOC-2022-Tu3C.5

Traffic Tolerance of Nanosecond Scheduling on Optical Circuit Switched Data Centre Networks

Published in Optical Fiber Communications Conference and Exhibition (OFC), 2022

PULSE's ns-speed NP-hard network scheduler delivers skew-tolerant performance at 90% input loads. It achieves >90% throughput, 1.5-1.9 µs mean and 16–24 µs tail latency (99%) for up to 6:1 hot:cold skewed traffic in OCS DCN.

Recommended citation: J. L. Benjamin, A. Ottino, C. W. F. Parsonson and G. Zervas, "Traffic Tolerance of Nanosecond Scheduling on Optical Circuit Switched Data Centre Networks", OFC-22: Optical Fiber Communications Conference and Exhibition, 2022 https://ieeexplore.ieee.org/document/9748332

Traffic Generation for Benchmarking Data Centre Networks

Published in Optical Switching and Networking, 2022

Benchmarking is commonly used in research fields, such as computer architecture design and machine learning, as a powerful paradigm for rigorously assessing, comparing, and developing novel technologies. However, the data centre network (DCN) community lacks a standard open-access and reproducible traffic generation framework for benchmark workload generation. Driving factors behind this include the proprietary nature of traffic traces, the limited detail and quantity of open-access network-level data sets, the high cost of real world experimentation, and the poor reproducibility and fidelity of synthetically generated traffic. This is curtailing the community's understanding of existing systems and hindering the ability with which novel technologies, such as optical DCNs, can be developed, compared, and tested.

Recommended citation: C. W. F. Parsonson, J. L. Benjamin and G. Zervas, "Traffic Generation for Benchmarking Data Centre Networks", Optical Switching and Networking, 2022 https://www.sciencedirect.com/science/article/pii/S1573427722000315

Partitioning Distributed Compute Jobs with Reinforcement Learning and Graph Neural Networks

Published in Under peer review, 2022

From natural language processing to genome sequencing, large-scale machine learning models are bringing advances to a broad range of fields. Many of these models are too large to be trained on a single machine, and instead must be distributed across multiple devices. This has motivated the research of new compute and network systems capable of handling such tasks. In particular, recent work has focused on developing management schemes which decide *how* to allocate distributed resources such that some overall objective, such as minimising the job completion time (JCT), is optimised. However, such studies omit explicit consideration of *how much* a job should be distributed, usually assuming that maximum distribution is desirable. In this work, we show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate. To address this, we propose PAC-ML (partitioning for asynchronous computing with machine learning), which leverages a graph neural network and reinforcement learning to learn how much to partition computation graphs such that the number of jobs which meet arbitrary user-defined JCT requirements is maximised. In experiments with five real deep learning computation graphs on a recently proposed optical architecture across four user-defined JCT requirement distributions, we demonstrate PAC-ML achieving up to 56.2% lower blocking rates in dynamic job arrival settings than the canonical maximum parallelisation strategy used by most prior works.

Recommended citation: C. W. F. Parsonson and G. Zervas "Partitioning Distributed Compute Jobs with Reinforcement Learning and Graph Neural Networks", Under peer review, 2022 Link to be added soon.

Learning to Solve Combinatorial Graph Partitioning Problems via Efficient Exploration

Published in Under peer review, 2022

From logistics to the natural sciences, combinatorial optimisation on graphs underpins numerous real-world applications. Reinforcement learning (RL) has shown particular promise in this setting as it can adapt to specific problem structures and does not require pre-solved instances for these, often NP-hard, problems. However, state-of-the-art (SOTA) approaches typically suffer from severe scalability issues, primarily due to their reliance on expensive graph neural networks (GNNs) at each decision step. We introduce ECORD; a novel RL algorithm that alleviates this expense by restricting the GNN to a single pre-processing step, before entering a fast-acting exploratory phase directed by a recurrent unit. Experimentally, ECORD achieves a new SOTA for RL algorithms on the Maximum Cut problem, whilst also providing orders of magnitude improvement in speed and scalability. Compared to the nearest competitor, ECORD reduces the optimality gap by up to 73% on 500 vertex graphs with a decreased wall-clock time. Moreover, ECORD retains strong performance when generalising to larger graphs with up to 10000 vertices.

Recommended citation: T. D. Barrett, C. W. F. Parsonson and A. Laterre "Learning to Solve Combinatorial Graph Partitioning Problems via Efficient Exploration", Under peer review, 2022 https://arxiv.org/abs/2205.14105

Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories

Published in Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI), 2023

Combinatorial optimisation problems framed as mixed integer linear programmes (MILPs) are ubiquitous across a range of real-world applications. The canonical branch-and-bound algorithm seeks to exactly solve MILPs by constructing a search tree of increasingly constrained sub-problems. In practice, its solving time performance is dependent on heuristics, such as the choice of the next variable to constrain ('branching'). Recently, machine learning (ML) has emerged as a promising paradigm for branching. However, prior works have struggled to apply reinforcement learning (RL), citing sparse rewards, difficult exploration, and partial observability as significant challenges. Instead, leading ML methodologies resort to approximating high quality handcrafted heuristics with imitation learning (IL), which precludes the discovery of novel policies and requires expensive data labelling. In this work, we propose retro branching; a simple yet effective approach to RL for branching. By retrospectively deconstructing the search tree into multiple paths each contained within a sub-tree, we enable the agent to learn from shorter trajectories with more predictable next states. In experiments on four combinatorial tasks, our approach enables learning-to-branch without any expert guidance or pre-training. We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables, with ablations verifying that our retrospectively constructed trajectories are essential to achieving these results.

Recommended citation: C. W. F. Parsonson, A. Laterre and T. D. Barrett "Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories", AAAI-23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023 https://arxiv.org/abs/2205.14345?context=cs

A Hybrid Beam Steering Free-Space and Fiber Based Optical Data Center Network

Published in Journal of Lightwave Technology (JLT), 2023

Wireless data center networks (DCNs) are promising solutions to mitigate the cabling complexity in traditional wired DCNs and potentially reduce the end-to-end latency with faster propagation speed in free space. Yet, physical architectures in wireless DCNs must be carefully designed regarding wireless link blockage, obstacle bypassing, path loss, interference and spatial efficiency in a dense deployment. This paper presents the physical layer design of a hybrid FSO/in-fiber DCN while guaranteeing an all-optical, single hop, non-oversubscribed and full-bisection bandwidth network. We propose two layouts and analyze their scalability: (1) A static network utilizing only tunable sources which can scale up to 43 racks, 15,609 nodes and 15,609 channels; and (2) a re-configurable network with both tunable sources and piezoelectric actuator (PZT) based beam-steering which can scale up to 8 racks, 2,904 nodes and 185,856 channels at millisecond PZT switching time. Based on a traffic generation framework and a dynamic wavelength-timeslot scheduling algorithm, the system-level network performance is simulated for a 363-node subnet, reaching >99% throughput and 1.23$ us average scheduler latency at 90% load.

Recommended citation: Y. Liu, J. L. Benjamin, C. W. F. Parsonson, and G. Zervas "A Hybrid Beam Steering Free-Space and Fiber Based Optical Data Center Network", Journal of Lightwave Technology (JLT), 2023 Link to be added soon

A Vectorised Packing Algorithm for Efficient Generation of Custom Traffic Traces

Published in Optical Fiber Communications Conference and Exhibition (OFC), 2023

We propose a new algorithm for generating custom network traffic matrices which achieves 13x, 38x, and 70x faster generation times than prior work on networks with 64, 256, and 1024 nodes respectively.

Recommended citation: C. W. F. Parsonson, Joshua L. Benjamin, and G. Zervas "A Vectorised Packing Algorithm for Efficient Generation of Custom Traffic Matrices", OFC-23: Optical Fiber Communications Conference and Exhibition, 2023 https://arxiv.org/abs/2302.09970

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.