GlobalQuran is a well written eco-system of UI pages, widgets, APIs, search that anyone can use on their site.

This page is to explore & evaluate new DataSets that may or may not end up being used in GlobalQuran.

  •  [Word2Word data]
  •  [Buck data]
  • [Corpus grammar data]


  • Word2Word data:
    • Objective: For each quran word, get the english meaning. No compound word meanings.
    • Demo: [http://qurandev.github.com/ http://qurandev.github.com/]  -- see the 4th column
    • Challenges: as its machine matched, data not 100% accurate. For compound words, some of the meanings are prefixed with **. These are where the meaning is duplicated for side-by-side words.
    • How data generated?: 
      • started with the orig data from [http://allahsquran.com/learn/ http://allahsquran.com/learn/] 
      • snip out all meanings from js. Ex: [http://allahsquran.com/learn/ayas-s112d7q1f0o4.js http://allahsquran.com/learn/ayas-s112d7q1f0o4.js]
      • Identify all places where theres mismatch in #words in tanzil and the meaning segments.
      • where mismatch, either fixed by hand in few cases. or duplicate the meaning after prepend **.
    • Note: Apparently quran.com and corpus.quran.com also had to go thru these word2word data cleanups.

Comments


blog comments powered by Disqus

.