Recovering and Reusing Legacy XML / JSON Across Years of Content
Several clients came with large volumes of legacy content built up over many years:
-
Inconsistent XML or JSON structures
-
Partial translations
-
Files no longer compatible with modern TMS setups
In some cases, this content was still actively referenced or updated.
What I did
I reverse-engineered file structures and built tooling to recover and realign legacy content so it could be reused safely.
This typically involved:
-
Analysing and reconstructing schemas
-
Programmatically aligning source and target content
-
Recovering usable translations from historical data
-
Producing validated, future-proof outputs
Scale & longevity
-
Content spanning many years
-
Tens of thousands of segments
-
Multiple languages and formats
The outcome
-
Legacy content became usable again
-
Clients avoided large-scale retranslation
-
Older content could now be maintained alongside new material
This work turned long-term technical debt into usable assets.
