Fintech Lessons from the Interac Outage

John J. Schaub 

Aug 12, 2022 

For international readers Interac is a Canadian interbank network created in 1984 and cooperatively owned by all the major financial institutions in Canada. The Interac network facilitates ~25M transactions per day (In a country of 38M people) and in a very real way represents the totality of the Canadian person to person payments ecosystem via the eTransfer product and a very significant portion of the merchant payments ecosystem via debit card payments. This sort of massively adopted low cost solution is a benefit of a highly regulated banking system but brings with it the obvious issue of having all your eggs in one basket. If Interac ever failed a large portion of the Canadian payments landscape would grind to a halt and one month ago on July 8th 2022 due to a failure of the underlying cellphone network operated by Rogers Communications that is exactly what happened. For roughly 24 hours both the debit and eTransfer network were inaccessible and Canadian consumers were unable to send money or pay via debit. Of course this situation caused a spike in demand for cash and in some cases withdraws from ATMs became impossible due to lack of cash supply. The incident is obviously terrible for Interac but it provides a number of important lessons for FinTech companies striving to reach massive scale.

Lesson 1: Finance is Serious - Building the systems that people use to pay their mortgage or buy groceries for their families is unlike most other roles in technology. Every person on your team from the developer writing the code to the lawyer negotiating the partnership agreements needs to understand that the services you are building are mission critical in people's lives and even the appearance of instability will do long term damage to your brand. 

Lesson 2: Scale has Downsides - Every FinTech founder dreams of reaching the level of market dominance that Interac has achieved but you need to be aware that as you scale the stakes will increase. In the case of the Interac outage there were aggressive statements released by various government Ministers before the outage was even resolved and within three days of the incident there were already additional regulations being put in place. When you have a near monopoly level of market dominance the reaction to any failure is going to be swift and aggressive and will have an impact on your brand and business well beyond the simple technology issues. 

But do not think that just because you are small you can put off worrying about the increased stakes until you reach a dominant market position. At every level of growth the challenges you face will ramp up and the regulators and criminals will pay increasing attention to what you are doing. As a real world example I have worked with a FinTech that like clockwork saw increasingly sophisticated cyber attacks within days after every single early mention in the media. As a FinTech leader you need to stay ahead of the curve on scale and be constantly re-evaluating your position to ensure you have the systems and processes necessary in place before the scale hits because trying to recover after the fact is near impossible.

Lesson 3: Sometimes you shouldn't eat your own dog food - In technology there is a popular piece of wisdom 'eat your own dog food' which means that your staff should use your product so they are well aware of its strengths and weaknesses. This advice is taken as gospel truth for good reason but a big problem arises when you are a system provider and the dog food your staff are eating is the tool that they need to maintain your system. Specifically in this incident the Rogers technicians working to resolve the underlying network issues were hampered by the fact that they use Rogers phones to communicate with one another which were of course useless with the Rogers network unavailable. So while you should absolutely 'eat your own dog food' you should always have a backup plan in place that will let your team quickly recover from a system failure. One of the more obvious outcomes of this incident was a government mandate that the various telecommunications companies put in place mutual assistance agreements.

Lesson 4: Maintenance is Dangerous - Every technical person already knows this but all too often executives who come up via non-technical paths simply think of maintenance as a routine part of system operation. The reality is that maintenance, while absolutely essential, is the most dangerous time in the life of any complex system. The Chernobyl disaster, the recent Facebook Outage and this incident all stemmed from errors that occurred during system maintenance. In any mission critical system maintenance must be thoroughly planned with contingencies and rollback plans documented and the work must be executed by your best technical staff. There can be no exceptions to this and taking maintenance seriously needs to be baked into the company culture from day zero.

Lesson 5: Single Points of Failure will Fail - A single point of failure is any critical component or system for which there exists no backup. In highly available systems design single points of failure are viewed very unfavourably but might be deemed acceptable if they are suitably tested and controlled. The surprising thing in this case is that a vendor solution (the Rogers Communications network) was deemed an acceptable single point of failure by Interac. In my experience systems designers tend to only accept internally controlled SPOF if they are willing to accept them at all and I would have thought that a hot spare design would have existed in a case like this. That is that terminals would maintain a live connection via multiple different networks and be able to switch to the secondary network should the primary become unavailable.

The challenge that you will face as a small FinTech is that redundancy costs money and potentially a lot of money. While in this case Interac's decision to rely solely on a single network was clearly an error, a smaller FinTech would be simply unable to bear the added cost of multiple networks. For the record I have personally been involved in deploying a similar debit network on a vstly smaller scale and was part of the decision to use a single network with no backup so I understand how this decision was made. Based on my experience my suspicion is that the underlying design decision to rely on only one network was made when the Interac system was vastly less dominant then it is now and never re-evaluated as their scale increased (see Lesson 2). 

Lesson 6: Contracts are Pieces of Paper - In a statement released four days after the incident</a> Interac CEO Mark O'Connell cites "availability commitments from our suppliers" that could not be met. While it is technically true that the cause of the failure was the Rogers network, contractual protections should not be relied upon to ensure system reliability esspecially as a smaller FinTech. Availability commitments of this sort in contracts merely layout in clear terms the standard that a vendor agrees to meet and the penalty that will be applied should the vendor fail to meet them. Yes obviously the threat of penalties should have the effect of making vendors ensure that their systems are reliable but in reality everything in business becomes a cost benefit calculation and the damage to your reputation never matters as much as you would hope to your vendor. By no means am I saying that contracts are useless, quite the contrary but if your business depends on a system or service being available you need to do more than require it in a contract. In short when you are asking the question 'what do we do if this vendor fails to deliver?' the answer cannot be 'we will sue them'. As a small FinTech you will be out of business long before you could successfully sue a large vendor that did not meet their contractual obligations. 

Hopefully this incident gives you some food for thought as you build out your FinTech. If you'd like to discuss anything I've written do not hesitate to reach out.