Recent outages – explanations and resolutions

Over the last few days, Active911 has had several problems we should have anticipated but did not. We have already described some potential iOS 11 problems, but we should elaborate on the other problems and what we are doing to fix them.

Webview/PC App Not Connecting or getting Alerts:

There had been a difficult to find and diagnose bug that would cause Webview to not connect to our servers when there was a combination of lag between the Webview client and slowness with our severs. Yesterday, we had a chain reaction start that led to ~100 Webview devices encountering the bug, overloading our real-time communication services ability to log people in, causing delays in being able to tap response buttons and get real-time updates.

How we fixed it:
We isolated the Webview bug and rolled a fix last night to all Webview clients. If you are still encountering problems with Webview not connecting, a force refresh should fix it. To force refresh on Windows, hold Ctrl+F5, for Mac hold Cmd+Shift+R. If after a force refresh you are still encountering problems, we have added a test button in the settings menu that will help us further fix the problem. Please run the test and copy the output in an email to support@active911.com.

Screen Shot 2017-09-29 at 11.15.32 AM

In addition to rolling the fix, We thought we could resolve the problem by making some server configuration changes, which worked until this morning, at which point our developers concluded our real-time update service that should be able to handle the load could not and needed to be completely replaced. This morning we built a brand new server that can easily handle what our old one could not. Responses and alerting should now be faster than before.

Assignments randomly missing:

When we were building assignments, we decided to test some experimental technology that had the potential to help us make better software. Once we deployed assignments and started to really use and interact with that technology, we kept running into problems and unexpected difficulties, which is why assignments has been in beta for so long. In this past week, the assignments server started to be unable to handle the load we were putting on it, causing it to often return nothing instead of what the apps were asking for. This increased load also slowed down the startup process of all our clients, resulting in odd behavior like alerts not showing up in the alerts page.

How we are fixing it:
With the unexpected side effects of alerts not showing up, we had to turn off assignments. Today, we are rebuilding the pieces that became overloaded and we will be migrating fully migrating off the experimental technology we were trying in the upcoming week or two. Once the new assignments server has been built up and tested, I’ll update this post.

Better Alert Types

This is how we currently classify alerts

The pie chart is getting an upgrade

We have added some features to our website so alarms can be classified by type.

Many of you are familiar with NFIRS codes.  Codes starting with 100 mean “fire”.  For example:

NFIRS 162. Outside equipment fire. Includes outside trash compactors, outside HVAC units, and irrigation pumps. Excludes special structures (110 series) and mobile construction equipment (130 series).

NFIRS is a US standard, but we have modified our system to support other systems as well.  Currently NFIRS is the only classifier that is programmed; if you want another classifier, you will have to let us know what your system or country uses.

  1. We have added a column on the alerts tab that lists the alert type for easy reference
  2. You can select an alert type by clicking on the alert and scrolling to the bottom of the window.
  3. You can do a full text search by name or by code number.  We have also reorganized the codes into an easy-to-browse list. When you have made a selection, push “save”.
  4. “See also” suggestions, if any, will be displayed below the description.  NFIRS codes are complicated, we have tried to make it easy to choose the right one.
  5. We are adding a server component to learn from your selections and auto-select types for future alarms. In order to train our server, we are asking users to log into their web consoles and manually classify a couple dozen recent alarms.  We can use this data to perform machine learning.

Some users have pointed out that a machine can not be accurate 100% of the time, especially since an alert classification often changes based on what is found on scene.  A “651 smoke” alarm might turn into a “111 Structure Fire” or a “733 Smoke detector malfunction”.  This is reasonable. To ensure clarity, human classified alarms will be shown in dark letters with the full 3-digit code on the Alerts tab.  Computer classified codes will show the century digit only and will be grayed out.  For example:

111 Fire (black, 3 digits) = Human classified

100 Fire (gray italics, 1 digit) = Computer Classified

Of course, if the computer gets it wrong you can always correct it manually.

The next step (once alarms are being classified right) is for us to update the pie graph to show the new classification system.