Architects of the largest data centres have set optical engineers a challenge: to create cheaper 100 Gigabit interfaces that span up to 2km. Roy Rubenstein reports
Existing 100 Gig interfaces have reaches that are either too long and costly or too short - and web companies' demands have stirred a flurry of industry activity, with four optical module initiatives announced since the year's start. Cheaper 100 Gig mid-reach interfaces also promise to benefit telecoms, with wireless being one application already identified.
Optical module designers must reconcile two contradictory trends: data centres are getting larger, inevitably lengthening the links between systems, yet optical reach gets shorter with increasing channel speed.
The move from 10 to 100 Gig interfaces increases the lane speed from 10 to 25Gbit/s; 100 Gig uses either four 25 Gig links over parallel fibre, or four 25 Gig light paths on a single fibre with wavelength division multiplexing (WDM). As a result, the reach of 10 and 40 Gig multimode interfaces is 300m, or 400m using OM4 fibre, but for the 100 Gigabit 100GBASE-SR10 and 100GBASE-SR4 multimode standards, the reach plummets to 100m (150m on OM4 fibre). 'And 100 metres is not enough for most data centres,' said Arlon Martin, senior director for marketing at Mellanox Technologies.
The next available 100 Gig interface option specified by the IEEE is the single mode 100GBASE-LR4 standard, but its 10km reach is overkill for most data centre applications. The LR4 is also expensive, costing seven times that of the 100GBASE-SR10. Certain larger data centre operators have chosen to forgo multimode fibre altogether. Microsoft, for one, has said it will deploy single-mode fibre exclusively in its data centres. Such cloud players want a 100 Gig single-mode interface that is cost-competitive with existing multimode solutions, yet covers this enormous middle ground between 100m and 10km.
'The largest data centre operators will tell you that less than 1km, less than 500m, is their sweet spot,' said Martin Hull, director of product management at switch vendor Arista Networks. What is being discussed is not so much exact point-to-point data centre distances but a module's optical link budget, the amount of loss that can be accommodated for the sent signal to be received.
A signal may pass through one or more fibre patch panels in transit, introducing additional attenuation. For 10km links, the optical link budget is around 6dB, for 2km it is more like 4dB.
'Some companies are at the cutting edge of technology, building very large data centres that have to have access to these reaches,' said Rafik Ward, Finisar's vice president of marketing. But there are also a large number of enterprises that continue to operate smaller data centres and for them, existing IEEE multimode standards and reaches meet their needs.
Finisar estimates that the longer-reach interfaces - up to 2km - will account for 20 per cent of all the interfaces in the future.
Leaf and spine
The larger data centre operators use a flatter, two-tier switching architecture known as leaf and spine. 'The flatter switching architectures require larger quantities of economical links between the leaf and spine switches,' said Dale Murray, principal analyst at market research firm, LightCounting.
A 'leaf' can be a top-of-rack switch that connects servers to the larger-capacity 'spine' switches. Enterprises still commonly use the traditional three-tier architecture comprising top-of-rack, aggregation and core switches.
But the larger enterprises are also adopting the leaf-spine architecture for the same reasons as the hyper-scale data centre operators: its efficiency in coping with the much greater traffic flows across a switching tier due to server virtualisation and newer applications rather than up and down the switching tiers associated with traditional applications. In enterprises, the server blade interface to the top-of-rack switch is only now transitioning from one to 10 Gig, while the connection between the leaf and spine switches is at 40 Gig. The larger data centre operators already use 10-Gig server blades and they want 100-Gig links to connect their spine switches to an edge router, or to a network backbone – another tier of high-performance 100-Gig switches connecting the spine to other sections in the data centre hosting leaf-spines.
Meanwhile, the most demanding hyper-scale data centre operators want to connect their leaf and spine switches with 100 Gig now, says Hull. What is holding the operators up is the lack of high-port-count, 100-Gig switches and accompanying, moderately priced 100-Gig optical modules.
The issue of high port-density 100-Gig switches is now being addressed. Both Arista and Mellanox have announced recently their first 100-Gigabit switches that support the compact QSFP28 transceiver. Meanwhile, optical module makers are busy developing specifications for inexpensive 100 Gig mid-reach QSFP28 modules.
IEEE standard and MSAs
The IEEE set up the 802.3bm Task Force to create a cheaper, 500m interface specification.
Four proposals emerged: parallel single mode (PSM4), coarse WDM (CWDM), pulse amplitude modulation, and discrete multi-tone. But none of the proposals received sufficient backing to pass the required 75 per cent voting threshold, such that no standard was adopted.
This forced the optical industry to change tack, pursuing a multi-source agreement (MSA) strategy to bring mid-reach solutions to market. Since January, four single-mode interfaces have emerged: the CLR4 Alliance, CWDM4, PSM4 and OpenOptics. 'The MSA-based solutions will have two important advantages,' said Murray. 'All will be much less expensive than the 10km, 100-Gig LR4 module, and all can be accommodated by a QSFP28 form factor.'
The 100 Gig PSM4 differs from the other three designs in its use of a parallel ribbon fibre. The PSM4 also has a 500m reach instead of 2km. The design uses four 25-Gig channels, each sent over a fibre, such that four are used in each direction. In contrast, the CLR4, CWDM4 and OpenOptics all use 4x25-Gig WDM over duplex single-mode fibre.
The PSM4 is technically straightforward to implement and is likely to be the most economical of the interfaces. But while the PSM4 transceiver cost will likely be the cheapest, parallel fibre is more expensive than duplex such that any module price advantage is eroded the longer the link. A data centre manager will consider using the PSM4-based on such factors as the parallel single mode fibre already deployed or whether the PSM4 is deemed the most cost-effective solution for their requirements.
'The PSM approach has secondary applications which make it very attractive,' said John D'Ambrosia, chairman of the Ethernet Alliance organisation. The PSM4's 25-Gig channels can be used as individual lower speed 'breakout' links. Already, a 25-Gigabit Ethernet consortium has been set up with members that include Google, Microsoft, Arista and Mellanox, while the IEEE has started work to create a 25-Gigabit Ethernet standard. The PSM4 module could also support the 32-Gig fibre channel and high-density 128-Gig fibre channel.
In contrast, the OpenOptics MSA, backed by Mellanox and start-up Ranovus, uses the 1550nm C-band and dense WDM, whereas the CLR4 Alliance and CWDM4 operate around 1310nm and use coarse WDM technology.
The 100-Gig OpenOptics uses four 25-Gig channels. The low channel count means that a far wider spacing can be used – some 800GHz compared to the traditional metro and long-haul DWDM channel spacing of 50GHz. The wider spacing relaxes the margins, simplifying module manufacturing, but by using DWDM technology, OpenOptics can add more wavelengths in future such that 16-channel (400-Gig) and even higher channel count designs will be possible. 'There is no plan [from competing mid-reach designs] to go from CWDM to eight or 16 channels,' said Martin.
Mellanox has in-house silicon photonics technology. Its OpenOptics module design uses an array of common gain sources to generate light in the C-band, with individual wavelengths generated using grating-based waveguides.
The company has already demonstrated a QSFP28 OpenOptics prototype that consumes less than 3W; the QSFP28 has a maximum power rating of 3.5W. 'We think that silicon photonics in general will be very competitive with VCSELs (used for 100Gig -SR4 and -SR10 multimode solutions),' said Martin. 'That is our target.'
Driving down costs
Meanwhile, the CLR4 Alliance is an Intel and Arista-backed initiative that has garnered wide industry backing (see table). Ciena, an optical transport specialist, has joined the CLR4 Alliance. 'We want to continue to drive the costs out of 100 Gig interconnects – both within the data centre, and between the data centre and the wide area network,' said Steve Alexander, senior vice president and CTO of Ciena.
The CLR4 Alliance, unlike the other three, is an industry grouping and not an MSA. Such a consortium shares a common goal but does not work as closely as companies developing an MSA that sign a non-disclosure agreement and share intellectual property.
The specifications of the CLR4 and CWDM4 are very similar. Both include forward error correction (FEC), not in the module but as part of the system design, but whereas FEC is fundamental to the CWDM4, it is optional with the CLR4. 'We have focussed on the FEC-enabled [CWDM4] version so that optical manufacturers can develop the lowest possible cost components to support the interface,' said Mitchell Fields, senior director in the fibre-optics product division at Avago Technologies. The FEC adds flexibility, he says, not just in relaxing the components' specification but also by simplifying module testing.
By having FEC only as an option, the CLR4 Alliance's interface avoids the delay associated with FEC, suiting applications such as high-frequency trading where latency is an issue.
The backers of CWDM4 and CLR4 are working to align their specifications and while it is likely that the two will interoperate, it remains unclear whether they will merge. The CWDM4 specification is set for completion in September, said Avago. The similarity of the CWDM4 and CLR4 means that both designs will come to market simultaneously. 'If you have specific applications that emerge that require one [design] or the other, you basically have both covered,' said Ram Rao, product marketing, senior manager at Oclaro.
Finisar highlights how the emergence of 2km 100-Gig interfaces could also be used in wireless networks. One emerging trend in the radio access network is separating the radio equipment from the base station in order to pool the base station functionality. Using simple remote radio units and centralising the base station units promises greater efficiencies and operational gains: radio cell capacity can be moved around as required unlike current systems where maximum capacity is provisioned for each cell even though it is rarely required. And virtualisation and cloud computing techniques could be used to implement the base station functionality on standard servers rather than specialist hardware.
Optics is needed here as significant traffic is generated between the remote radio heads and the base stations. Dubbed mobile front-hauling, this is where cheaper 100-Gig interfaces could play a role.
'In the wireless network market, there is a large and emerging need for optics that will go up to 2km,' said Ward. For now, though, front-hauling is an emerging application and 10-Gig optical links is sufficient for front-hauling. Wireless also requires industrial-temperature optics operating between -40C and 85C whereas data centre optics operate in a more controlled environment.
The industry's view is mixed as to how the four mid-reach 100 Gig designs will fare.
Oclaro, a member of the CWDM4 and PSM4 MSAs as well as the CLR4 Alliance, says it is seeing end-user interest in both the parallel and duplex modules. 'With the PSM4 and the CWDM4/CLR4, you are matching up with the [data centre] cabling infrastructure,' said Rao.
Ciena, while having joined the CLR4 Alliance, welcomes all four developments. 'The CLR4 Alliance as well CWDM4, PSM4, and OpenOptics are promising initiatives as they are focussed on identifying new ways to scale network capacity in a more cost-effective manner,' said Alexander.
But D'Ambrosia of the Ethernet Alliance regrets that four specifications have emerged. 'My own personal belief is that it would be better for the industry overall if we didn't have so many choices,' he said. 'But the reality is there are a lot of different applications out there.'
Avago's Fields says it is hard to give guidance as to how the market will develop. Avago believes that the 100GBASE-LR4 will become the smallest of the segments as it will take time before it appears in a QSFP28. The hyper-scale data centre operators will be the early adopters of the mid-reach 100 Gig QSFP28s, whether PSM4 or CLR4/CWDM4, skewing the market towards single mode. But, once enterprises start adopting 100 Gig widely, more and more 100-Gig multi-mode modules will be used.
LightCounting expects the PSM4 and a merged CWDM offering to find strong market traction. 'Avago, Finisar, JDSU and Oclaro are participating in both categories, demonstrating that each has its own value proposition,' concluded Murray.