Data Storage

CrashCABN uses Azure services to store and transmit data, including KeyVault for secrets, SQL for administration, and blob and queue storage for crash processing.

Containers

The blob storage must have a container named "crashes" for CrashCABN to store retail crashes in.

Retail Crash Expiry

For legal compliance, crash data and files collected from any end-user retail builds or environments must not be retained for more than 30 days. In order to ensure this is maintained, we use central storage for retail crashes with policies to delete blobs automatically 30 days after they're ingested. See the Azure documentation site for more details. Note that these policies are not required for internal development crashes.

Notify and Acknowledge Deletion

In addition to clearing our own internal storage of any legally-expired data, we must also ensure if we've provided downloads to engineers, that those files are also deleted at the appropriate time. In order to accomplish this, we use a SQL database to track retail crash file downloads and an Azure Function to provide automatic email reminders to acknowledge the data's been properly destroyed.

Message Size Limit

Messages sent to an Azure Storage queue can have a maximum size of 64KB, which limits how much data we can easily transfer between functions. In some cases we truncate fields to a reasonable limit like 10K characters for the call stack trace, and where possible we transmit attachment files like the WatsonAnalysis.xml using a link to a storage URL instead of the contents. If we need to transmit larger sized metadata and attachment files like those sometimes included in development builds, we temporarily upload the message to an Azure Blob and then enqueue a smaller message containing that blob URL.

Automatic Requeuing of Poison Messages

Azure Functions has a built-in retry mechanism where failures are quickly retried up to 5 times before being moved to a poison queue. In addition to this functionality, we provide our own Azure Function to re-process poisioned ingestion messages if they fail, specifically in case the crash download from Watson fails, but not if the bug filing function fails. In V1 we used a SQL database to keep track of this and allow for up to 50 additional attempts per failed crash, while V2 updates the retry count in the JSON message to avoid costly external SQL dependencies.

This page was last modified on July 25 2023, 05:56 PM (UTC).