Backyard Hockey League (BHL)

Aug 05, 2022 • 6 minutes to read

Root Cause Analysis - BHL data loss and KBHL/BHL video streaming issues

  • Date: 2022-08-04
  • Categories: video, nest, firebase, security
  • Authors: Dan Fernandez
  • Severity: 1 Sev0, 1 Sev 1, multiple Sev 2s
  • Last Updated: Tuesday Aug 9, 4:21PM

Summary

The Backyard Hockey League website had no video stream in the Youtube live feed - The preliminary root cause is below.

The BHL (adult) suffered data loss following roster changes to the 1st period. The root case of this issue are still under investigation and the timeline will be updated.

Detection

User-detected error at ~4pm while going through the 8-step manual configuration for the Nest Cameras.

Impact

  • Sev 1: No video on live stream for both kids and adult leagues
  • Sev 0: Adult/BHL - All first period scoresheet points were lost. Second period scoresheet points were also lost but recovered post-game by manually re-entering data from a screenshot before the roster change

Overview

BHL suffered a series of cascading failures due to limited testing time, undocumented API behavior for the Nest API and as-yet-unknown issues for data loss.

No Nest video

Issue: Nest changes camera unique identifiers per account

Nest has a concept of apps (BHL Live is the app) and users (Dan’s personal Gmail account and a Backyard Hockey Gmail account) that must authorized to access the cameras. The BHL Live App, as shown below, shows up twice, once per gmail account.

Google Nest showing the same app for two accounts

Both links pull identical information from Nest:

The “Nest Home” (ex: Our House), The Rooms in the home (ex: Rink), and the cameras in a room (ex: LeftRink, RightRink).

If you went through the authorization consent screen for the top link and the bottom link, the home, rooms, and camera names would appear to be identical same (ex: you see the Rink room with two camers LeftRink and RightRink), but….

The Nest API will return DIFFERENT UNIQUE IDENTIFIERS for the same camera. The customName property will be the same, but the unique ID is different!

The code below shows a snippet of the hard-coded “ID” for the LeftRink custom-named camera. Since the ID was thought to be unique, it would reduce the need for an additional getDevices() API call, or the need to dynamically get all devices and apply brittle code to decide which camera is for the left versus right.

const leftCamera = "/enterprises/{IDENTICAL_PROJECT_ID}/devices/{CAMERA_ID_WE_HAVE_ALWAYS_USED}}"

Here is a Postman request to the getDevices API after clicking the second link

    "name": "enterprises/{IDENTICAL_PROJECT_ID}/devices/{COMPLETELY_DIFFERENT_CAMERA_ID}",
    "traits": {
        "sdm.devices.traits.Info": {
            "customName": "LeftRink"
        },

This started a cascading set of issues, compounded by late/lack of testing time, a day job, a bunch of manual/error-prone steps for the site including copy/pasting configuration, all the other things that need to be done, and worse yet, we had to scramble to clean the rink debris.

Nest API Error

Access to XMLHttpRequest at 'https://smartdevicemanagement.googleapis.com/v1/enterprises/[CAMERA_ID_WE_HAVE_ALWAYS_USED]:executeCommand' from origin 'http://localhost:{PORT}' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.

While this error may lead one to believe that the issue is CORS, it’s actually because the granted access token has access to the COMPLETELY_DIFFERENT_CAMERA_ID and not CAMERA_ID_WE_HAVE_ALWAYS_USED. When it tries to go to the latter, that’s when the error appears.

Accidentally breaking every Firebase API

Trying to fix what was originally seen as a CORS issue led me to install a Google Chrome extension called Allow CORS: Access-Control-Allow-Origin. Enabling this will cause every Firebase API request to fail, but I only discovered this after trying to load the website and seeing it crash. The fix was to disable/uninstall the extension.

Nest is brittle / complicated

Nest Camera APIs are overly complicated, brittle, and with inaccurate docs.

You need a:

  1. Google API Authorization
  2. Nest Project
  3. Nest Device Access Sandbox
  4. Google Cloud Project - totally different than Nest Project
  5. Nest Project ID
  6. Nest Authorization Code (unknown expireation)
  7. Nest Device IDs - These are apparently not unique
  8. oAuth CLIENT_ID - You must set authorized JavaScript Origins (CORS) & redirect URIs
  9. oAuth CLIENT_SECRET
  10. ACCESS_TOKEN (expires every 5m), must be refreshed or the stream will stop
  11. REFRESH_TOKEN - ~24-48 hour expiration

BHL Scoresheet Data Loss

In the adult BHL Game (~8:39PM), switching rosters in-between periods to balance the teams, caused the scoresheet to lose all 1st period scoring data (~7-10 points total).

Issue: Keeping multiple tabs, one to set the roster page, another to manage the scoresheet caused old data from the roster page to overwrite new data from the scoresheet manager page

Both the Roster web page and the Scoresheet Manager point to the same Game Object. The simplified data model for a game object looks like this:

Game Object
- ID: Game date (ex: 2022-08-04)
- Dark Team Roster - Array of players (skaters, goalies)
- Light Team Roster - Array of players (skaters, goalies)
- Game Event Log - Array of scoring events, where each event is a new item in the array. This works similar to a transactional log in that even a deleted goal (ex: a goal is mistakenly added to the wrong player) would add a new item to the array versus deleting an item from the array. Score calculations/stats are done by computing additions/subtractions

Series of events

  1. Manager Page: Add one goal to Game Object - GameEventLog length == 1 in both the local client object and Firebase
  2. Load Roster Page: Swap Jason Poon to the other team - Roster Page also has GameEventLog length == 1 in the local client and Firebase
  3. Manager Page: Play full 1st period adding approx 13 total goals in 1 in the local client object and Firebase
  4. ERROR: Switch to Roster Page: This page was not reloaded, so it had the same local client object from step 2 - GameEventLog Length == 1 on local client object, but 13 in Firebase as it’s the “old” Game object

Clicking Save then overwrote the Game Object with the outdated client object (1 event) losing all data in the Firebase object (13 events)

Resolution

Nest Camera

  • Nest screen camera was working after discovering that unique identifiers aren’t unique (1AM). As part of the diagnostic to ensure no CORS issues, entries were adding for localhost prefixes for the apps by re-doing the manual steps, ensuring camera identifiers were the same, and just in case: - https://console.cloud.google.com/apis/credential - Added new entries for localhost with port suffix (ex: localhost:{PORT})
  • Uninstall Allow CORS Extension - Not even once :)

Data Loss

  • Saving new versions of the Game Object will now compare the # of events and block if the number of events is less than the number of events in Firebase. This change has been deployed to production.

Remediation Steps

Nest [x] Automate all manual steps to setup the left/right camera and ensure the same app is linked [x] Build backup branches with no changes

  • Test the night before and morning of
  • Simplify the branch/merge strategy between all the apps to avoid config changes

BHL [x] Block saving games with less Events than the currently saved game

  • Build a backup listener service for game change events that logs game changes locally. Ensure the Game Event log is backed up before the e That way if Firebase or wifi goes down
  • Take screenshots of the Event log versus scoresheet if this locks up again
  • Build a data entry form for adding players versus the manual, error-prone mechanism

Disaster Recovery

  • Build a low tech plan if Firebase goes down, wifi goes down, the laptop is busted, etc
  • Low tech: Provide a backup notepad in case we need to track goals/assists/points

Timeline

First Update: Fri Aug 5 4:36PM Second & Final Update Tue Aug 9 4:21PM