GlobalQuran is a well written eco-system of UI pages, widgets, APIs, search that anyone can use on their site.
This page is to explore & evaluate new DataSets that may or may not end up being used in GlobalQuran.
- [Word2Word data]
- [Buck data]
- [Corpus grammar data]
- Word2Word data:
- Objective: For each quran word, get the english meaning. No compound word meanings.
- Demo: [http://qurandev.github.com/ http://qurandev.github.com/] -- see the 4th column
- Challenges: as its machine matched, data not 100% accurate. For compound words, some of the meanings are prefixed with **. These are where the meaning is duplicated for side-by-side words.
- How data generated?:
- started with the orig data from [http://allahsquran.com/learn/ http://allahsquran.com/learn/]
- snip out all meanings from js. Ex: [http://allahsquran.com/learn/ayas-s112d7q1f0o4.js http://allahsquran.com/learn/ayas-s112d7q1f0o4.js]
- Identify all places where theres mismatch in #words in tanzil and the meaning segments.
- where mismatch, either fixed by hand in few cases. or duplicate the meaning after prepend **.
- Note: Apparently quran.com and corpus.quran.com also had to go thru these word2word data cleanups.