Post

Migrating ODIN Parser from Native C++ to .NET C# with no downtime

History

NIPO has been developing ODIN Script for over 40 years, including the parser that parses the script into quickcode. Quickcode is a binary format that is used by the Engine to run interviews. The parser was originally developed in C, and later converted to C++.

When Nfield was originally conceived 15 years ago, NIPO made the choice to focus on one language and one platform - C# and .NET (and Azure as a cloud platform). However, since the parser is complex, we decided to use the C++ parser with interop. This maintains compatibility between Nfield and NFS and allows our users to migrate their NFS projects to Nfield with minimal effort. And it allows us to get Nfield to market in a shorter time by focusing on the unique features of Nfield.

For running the interviews in Nfield we wrote a new Engine in C# from the start, because this is relatively not so complex, although we still use the legacy engine for some functionality such as generating the datamap.

Over time, the number of developers at NIPO with the knowledge to continue to maintain and improve the C++ parser has reduced (and aged!), which has become a risk for the continuity of Nfield.

ODIN language is not a well structured language in the sense of Backus-Naur form, which means that there is lots of custom logic to interpret the meaning of the script. Some of this interpretation has developed and become accepted over the years, even when not necessarily what is logical. For example, it’s possible to have an *IF expression without a statement following it. This is pointless, but accepted by the ODIN parser and used in customers’ scripts.

New C# Parser

At the beginning of 2021 we made the decision to rewrite the parser in C#. By this time we’d been using the legacy parser to parse the scripts that have been used for over 1 billion interviews. We want to replace this parser with full confidence that there will be no disruption or change of behavior. Data integrity is critical for Nfield, so we didn’t want to risk making any changes that could affect the collection of data.

By the middle of 2021 we had a parser that was able to parse a simple question, and by the start of 2022 we had a parser that was able to parse most of the common ODIN features, and we were ready to start testing it with real customer scripts.

In order to monitor the progress we created the Script Analyzer which is an Azure Function that analyzes all scripts as they are uploaded to Nfield and generates a report of any differences. This report also contains counts of all commands used by the scripts so that we know which ODIN features to prioritize development for in the new parser. Every sprint since the introduction of the Script Analyzer, we spend a fixed amount of time looking at the reports and finding and fixing any differences that are reported. This is very important because the real customer scripts that are used in production are much more complex than the scripts that we can conceive of for testing and so they can find many more edge case scenarios that can cause differences in the parsing.

By the end of 2022 we were ready to start using the new parser in production. Every script uploaded by the user was parsed by both parsers, and warnings and errors from the both parser were combined. We compared the quickcode generated by both parsers, and by the middle of 2023, when the comparison showed no differences then we would use the quickcode generated by the new parser. In case of differences we would fallback to the quickcode generated by the old parser. As a safety measure, we gave the user the option to explicitly choose to use the old parser.

We cannot directly compare the quickcode since there are some intentional but insignificant differences. Instead we convert the quickcode back to script and compare the results. If the results are the same then we use the quickcode generated by the new parser. If there are differences then we fallback to using the old parser.

Since this comparison and fallback is silent, we added information to the interview audit log which shows which version of the parser was used to generate the quickcode that was used for each interview (actually for each question of each interview). We also added system logging to show the result of the comparison, and whether the user chose to use the old parser. This can help with diagnosing any issues that may arise.

By 2025 the reports were showing so few differences between the two parsers that we were confident enough that users should not need to explicitly choose to use the old parser, and so we removed that option, keeping the automatic fallback in case of differences. Right now (mid 2025) we are removing the automatic fallback and will soon completely remove the old parser from the code base.

Decisions

During this development phase, every time we encountered a difference between the behavior of the two parsers we had to make a decision on how to handle it.

When we considered the old parser behavior to be correct, we’d implement the same behavior in the new parser.

When we considered the old parser behavior to be incorrect or undesirable, we had to decide:

  1. implement the same behavior in the new parser - this creates intentionally “bad” code, but allows us to use the new parser with existing scripts without changing behavior
  2. implement the “better” behavior in the new parser - this makes the code more logical, but means that we will fallback to using the old parser. In this case we’d show the user an error warning about future deprecation
  3. implement the expected behavior in both the old and new parser - this is more effort, but makes both parsers better and more consistent - however it requires that users update their scripts, or accept the differences

The choice depends on how often the unexpected behavior is used in real customer scripts - as determined by the Script Analyzer reports - and the amount of effort and development capacity to update the old parser.

There were a lot of differences regarding things like whitespace handling, which is sometimes significant, and sometimes not.

Business Continuity

As well as this migration to the new parser, other work on Nfield continues as usual, including add new functionality and fixing bugs in the parser. This needs to be implemented in both parsers, which further complicates the migration.

Engine

As mentioned earlier, Nfield has been using the C# Engine for running interviews since the start of Nfield. However, we still use the legacy C++ Engine in the Nfield Manager for generating the datamap. We are also migrating this to C#, but that’s another story…

This post is licensed under CC BY 4.0 by the author.