Where does the time go in an EV conversion? I've asked this before. I'm starting to believe that, if you choose to develop your own battery management software, then most of it will go to weird software issues.
Today for example, a string of BMUs installed in the car seemed to not to download a new version of software. This is particularly irritating, because we pride ourselves on the ability to download new software at any time, and taking the battery boxes out of the car to JTAG program each battery management unit (BMU) is a fearsome amount of work.
So we went to repeat what we thought we did on the rollbar box, because if needed, we can access those cells easily. We loaded in our TestICal software, because it has a few options that the regular monitor software doesn't have. That downloaded successfully, but when we tested it, it behaved extremely strangely. For example, most of the BMUs thought that they had identity number 49, when the last ID in the box was definitely 45. Some reported their voltages in hex, others in decimal as standard. There were other weirdnesses that I don't recall now.
We eventually traced this to the software not receiving its initialisation call, so it was operating on uninitialised RAM. For a PC program, this will guarantee spectacular failure, but these programs have battery backed up RAM
It was only because Weber had recently moved locations in RAM that this showed up. The reason for the lack of initialisation was traced to a discrepancy in version between two pieces of software. For reasons I don't want to go into here, we have three pieces of software running at once; BSL1 (bootstrap loader number 1), BSL2, and either the monitor or the TestICal software. There is an easy way to ensure that this won't happen again in future, and we'll fix that soon.
But now we realised that the reason the original download failed was another incompatibility between these three pieces of software. Worse, this incompatibility meant that when the "password" sequence for a download was detected, the program would jump off to one of the other pieces using the wrong address. In fact, it ended up in the middle of an instruction. This meant that we could not download new software, and would have to JTAG each BMU separately, which necessitated pulling out all the battery boxes again. Surely there was another way!
The oldest bootstrap loader, BSL1, was still in these BMUs, but would immediately hand over control to the monitor program. The monitor program only listened for the password that is handled by BSL2. We had intended for it to handle the password for BSL1 as well, "just in case", but hadn't gotten around to it. Well, it would have been really good if we did! If we could find a way to cause the BMUs to activate their watchdog timers, then they would not allow the monitor to take control, and BSL1 would be there to accept new downloads. But the only way the watchdog timer takes effect is if something goes badly wrong, like a crash. I joked about getting one of those "zapper" machines that unscrupulous people use for crashing poker machines, sometimes causing them to eject all their coins. We also considered whether we could engineer a buffer overflow, but we'd fixed that problem years ago.
The astute reader will have realised by now that the answer lay with the monitor program jumping into the middle of an instruction when it detected the download password. But we almost missed this, and had resigned ourselves to pulling out the battery boxes "one last time". But sure enough, when the monitor jumped into the middle of an instruction, it crashed, and the watchdog timer reset the processor, and BSL2 wisely refuses to allow the monitor to run, so BSL1 is available to download new software. However, when the first BMU crashes, it doesn't pass the complete password to the next BMU, so only one BMU is thereby able to download again. But we found that by downloading a second time, the second BMU became able to download. So we could just attempt 109 downloads, and all would be good! On the 110th download, we could load a new BSL2, which would match the monitor, and so it would be able to download a new monitor or TestICal from then on.
In fact, we found we didn't have to wait the ~ 2 minutes for a complete download; we just needed to send the password (4 control characters). It only took a few seconds to send this by hand using a terminal program, and just repeat until all the BMUs were ready for downloads.
While this is a great theory, when we tried it, it didn't work. We tried it on a second set of battery boxes (nothing to loose, right?), and it also failed. One of the battery boxes has its lid readily able to be removed, so we did that, and used JTAG to figure out what was going on. Alas, this caused more confusion. It turns out the clock calibration on the first BMU was off a few percent, which corrupted only BX bytes into 9X, and didn't seem to affect other bit patterns at all. How weird is that? Well, par for the course today, it seems.
After this was fixed, the software still would not propagate past the third BMU. I suspect its speed calibration was also off a little, but by now it was after 10pm and I had to go home. But these BMUs have had many successful downloads, so it seems ridiculous that they would corrupt downloads just today.
Stay tuned to find out if Weber and Coulomb have to take out the battery boxes after all, or can they do it all with just one easy cover removed!