To aid a neutral party in assessing approaches to digital dictionaries for Japanese, I have posted an HTML file displaying 10,000+ of the first entries in Kanjidic2 at
http://kanji.aule-browser.com/kanjidic2-m12.htmlI have restricted the dump to the Kanji, the UCS code and a max of 12 of a possible 14 meanings.
There are less than 10,200 due to the fact that in the first 12,155 entries, many had no XML meaning content which was not assigned a language attribute. Those few thousand may have English translations in markup previously used for foreign languages.
The file can be found as
http://kanji.aule-browser.com/kanjidic2-m12.csvwith a three line header which you may have to alter for your purposes.
The Kanjidic2 XML file was parsed using the Curl XDM library from curl.com (Nihon-go http://www.curlap.com)
As it stands, the HTML file should be useful for building custom Anki flashcards (themselves stored as SQLite.) I will be using variant CSV output to construct dictionary software with annotations and spaced-repetition options. Curl has both CSV and SQLite libraries in addition to the XML libraries.
No comments:
Post a Comment