To many folks, Excel is their operating system. It's not just spreadsheet software, but data collection, analysis, visualization, reporting, and processing. But the software – designed to be flexible – is not without limitations. So in line with losing Covid data because of file sizes and formats, we now have this:
Each gene is given a name and alphanumeric code, known as a symbol, which scientists use to coordinate research. But over the past year or so, some 27 human genes have been renamed, all because Microsoft Excel kept misreading their symbols as dates.
The problem isn’t as unexpected as it first sounds. Excel is a behemoth in the spreadsheet world and is regularly used by scientists to track their work and even conduct clinical trials. But its default settings were designed with more mundane applications in mind, so when a user inputs a gene’s alphanumeric symbol into a spreadsheet, like MARCH1 — short for “Membrane Associated Ring-CH-Type Finger 1” — Excel converts that into a date: 1-Mar.
Studies found a fifth of genetic data in papers was affected by Excel error.
This is extremely frustrating, even dangerous, corrupting data that scientists have to sort through by hand to restore. It’s also surprisingly widespread and affects even peer-reviewed scientific work. One study from 2016 examined genetic data shared alongside 3,597 published papers and found that roughly one-fifth had been affected by Excel errors.
They could have just formatted an excel column or row to process as text instead of relying on excel to autoformat the cells, but you know... renaming genetic markers is easier.