GlobalQuran is a well written eco-system of UI pages, widgets, APIs, search that anyone can use on their site.
This page is just to explore & evaluate new DataSets that may or may not end up being used in GlobalQuran.
- [Word2Word data]
- [Buck data]
- [Corpus grammar data]
- [Transliteration data]
- Word2Word data:
- Objective: For each quran word, get the english meaning. No compound word meanings.
- Reliability: *** (3/5) almost as good as source
- Demo: [http://qurandev.github.com/ http://qurandev.github.com/] -- see the 4th column
- Challenges: as its machine matched, data may not be 100% accurate. For compound words, most meanings are prefixed with **. These are where the meaning is duplicated for side-by-side words.
- How data generated?:
- started with the orig data from [http://allahsquran.com/learn/ http://allahsquran.com/learn/]
- snip out all meanings from js. Ex: [http://allahsquran.com/learn/ayas-s112d7q1f0o4.js http://allahsquran.com/learn/ayas-s112d7q1f0o4.js]
- Identify all places where theres mismatch in #words in tanzil and the meaning segments.
- where mismatch, either fixed by hand in few cases. or duplicate the meaning after prepend **.
- Note: Apparently quran.com and corpus.quran.com also had to go thru these word2word data cleanups.
- Buck data:
- Objective: Instead of passing Tanzil Quran data as unicode, pass it as ascii. One-to-one mapping of arabic unicode to a ascii, which can be mapped & remapped, without loss of fidelity.
- Reliability: **** (4/5) as good as Tanzil, the source
- Demo: [http://qurandev.github.com/ http://qurandev.github.com/] -- see 3rd column
- Advantages?
- Buck uses less bandwidth
- In Javascript, u can search thru entire Buck quran text in 1 shot. intuitive compared to Arabic search
- Buck to Arabic and Arabic to Buck is a simple js call. Play with live sample here: [http://jsfiddle.net/BrxJP/ http://jsfiddle.net/BrxJP/].
- You can strip out all vowels from Buck text in few millisecs. Why do this? u can search in javascript, ignoring the taskheel differences (fatha, damma, kasra). They leads to more hits.
- Regex + buck text can lead to awesome optimizations. All the searches can be run locally. Demo: [http://qurandev.appspot.com/ http://qurandev.appspot.com/]
- How data generated? just one-to-one mapping using: [http://corpus.quran.com/java/buckwalter.jsp http://corpus.quran.com/java/buckwalter.jsp]
- Transliteration data based on Corpus:
- Not Done. It shud be simple 1-to-1 mapping using [http://corpus.quran.com/documentation/phonetic.jsp http://corpus.quran.com/documentation/phonetic.jsp]
- Alternate approach: Used a different data file which has tranliteration encoded in html format. using jquery and regex allows me to do search based on transliteration locally. (no server call)
- Demo: [http://qurandev.appspot.com/ http://qurandev.appspot.com/] type in 2:255
Comments / Questions??