Feature Toggle - Nightmares

May 1, 2021 3 min read Tech, Python

We were working on a custom algorithm to optimally grab images given a list of area. Every image grabbed is written into an object store with a generated hashkey. The haskkey can be used to query the recorded data - when we would like to play them back.

Software was a monolith and the algorithm was shipped with it. So, any new fixes would be shipped with the new software versions and user would be able to playback the same recorded data.

def customAlgorithm():
    locations = []
    # actual algo... 
    return locations;

A day came when we - rolled out a bug fix in the algorithm. But, we broke the backward compatiblity 😟. Users had lots of existing recorded data which they wanted to playback. Since the algorithm had changed - the generated hashkeys didn’t match the ones in the object store.

We thought its proabaly a one off scenario: so we added a feature toggle - user will have to manually (pain) figure out - which version the data was recorded on and update the toggle config (xml) and rerun.

def customAlgorithm():
    locs = []
    if version == '1.0':
        locs = oldAlgo()
    else:
        locs = newAlgo() 
    return locs

And - Well there was another bug fix:

def customAlgorithm():
    locs = []
    if version == '1.0':
        locs = Algo_1_0()
    else if version == ''2.0':
        locs = Algo_2_0()
    else:
        locs = latestAlgo() 
    return locs

And yeah the story went on for one more iteration and we had to STOP!!!! 😓 It was already becoming a nightmare to mantain the code.

It wasn’t just manually figuring out the toggle which was creating pain - but to enable code reuse, the algorthm had to be refactored multiple times - so that parts of the algorithm can be reused across versions. In addition to that the number of test suites we had to manage had gone up significantly, as there were multiple users each one using different version - and oh yeah there were version specific fixes released as patches 😟

Finally we decided to write all the metadata - related to recording - into an embedded database. And on playback we get the information from the database and not by re-running the code/algorithm.

On hindsight: the database feels like it an obvious solution, right? well the devil is in the details - in our case it about effort(read, time). The playback infrastructure based on database took ~8 weeks to build -first version. It had to handle numerous (10s) use cases. Previously, we only had one flow to be maintained i.e the flow which ran the use cases + algo shipped. But now we have to manage TWO different flows - one for recording and one for playback (x) No_Of_Use_Cases.

Its almost 3yrs since we rolled out this solution to production. When we look back it was a very good decision to build two different workflows 😍. Code is more structured and maintainable.

At times - short term goal/fix takes priority. But, its always good to step back and look at the pain points and have an item in your backlog to find a better solution. The priority should be driven based on feedback: remember the 80/20 rule

80 percent of customers only use 20 percent of the features in the software they’ve bought.

BTW: the database schema has changed a lot - with the addition of multiple use cases. Which also needs version management 😉 And evey new use case discussion now has two parts to be discussed - some problems are good to have (read, tradeoffs).

Ref: Feature Toggle

Tech FeatureToggle

Feature Toggle - Nightmares

Jacob Aloysious

Software Enthusiast

Related