I recently generated the parent-child and ancestor-progeny data for the ICD-10 diagnostic codes. You can find them on the GitHub repo I set up.
While generating the codes was a pretty straightforward matter of =vlookup()
and =left()
, I discovered only after exporting from Excel 2010 to a tab-separated text format that Excel or Word had converted my ASCII dash (-) and ellipses (…) into special dash (–) and ellipses (…) characters somewhere along the way.
I was using Notepad++ to review the file generated and I decided to determine just what characters existed in the CDC-provided data. I built up the following PCRE regex one piece at a time until I had the exact list (case insensitive):
[^-0-9A-Z trn.,[]/()'%<>=+%]
Of course, upon reflection, I just realized that J has a great way to find the characters:
/:~~.1!:1<'C:/Path/ICD10/icd10withHierarchy.txt' |
% |
View original post 87 more words