This year is the first year that the JLPT (Japanese Language Proficiency Test) will be run in Ireland. I’ve submitted my application form and now just have to wait until December. I searched the App Store for some apps that might help me brush up on my kanji and vocabulary for level 2, but I couldn’t find anything that worked the way I wanted it to. That left me with a great opportunity to make my own!
I’m using Core Data to store all the vocabulary. This allows me search through all the vocabulary somewhat quickly by specifying appropriate NSPredicates. An NSPredicate might look something like so
(isKanji == YES OR isKanjiHiragana == YES) AND level == 'JLPT 2'
This will give every JLPT level 2 word with a kanji character in it.
It has three different types of test, and seven tests including variations. The three tests are
- Reading input: you must type in the correct reading.
- Choose correct answer: you must select the correct answer from a list of potential answers.
- Hide answer: the answer is hidden under a button. This one is pretty much a flash-card.
These three become seven when you see that you can take a Choose Correct Answer test and ask the user to select the correct meaning of a word, select the correct reading of a word, or select the correct word for a particular meaning. This extends to other tests and we end up with seven. Three classes exist, but there are seven objects with various different instance variables determining how they behave.
There are various categories of vocabulary: Katakana words, Hirgana words, Kanji words, Kanji+Hiragana words, prefixes, suffixes, miscellaneous (mix) and untestable (do not have reading or meaning available).
There are four JLPT levels (for now), and each of these levels has the categories above.
Every time a particular test has to generate a question, it has to find a word in the database suitable for the particular test. For example, if it’s Reading Input, there’s no point in testing a katakana word because it will be phonetic to begin with. It only makes sense to pick words with kanji in them somewhere. This means choosing a random managed object (i.e., a random row) that matches a particular NSPredicate from Core Data.
It turns out that this is slow. When I’d load a test, the screen would freeze up for a couple of seconds until it had sifted through the 5k~ rows of the database with a complex predicate. There were a few things I did to get this going faster.
Cache query results for each test
The first place I ran to was to cache the array of possible questions/answers for each test in the test object. When a test queries Core Data for a list of objects matching a predicate, save that list for next time so it won’t have to do a search again. A test’s predicate will not change through-out its life so caching everything with no expiry defined is safe to do.
This meant that the second time a particular search ran, it was fast. The first time, however, it was just as slow as before. And with seven different tests, that meant the user would have to experience seven slow-downs.
Load all data into a shared array and search manually
Next I decided to try to centralise the slow-downs to one place. I loaded everything in the database into an array at startup, which takes a couple of seconds but no longer than it took for each individual test.
When a test would need to find a value matching its predicate, it would pull a random one from the shared array and check it against the predicate. It would keep doing this until it found one. Except, sometimes it didn’t ever find one. Or it would take a really really long time before it came across one that would work! If it was never going to find one, there was no way that it could know that… so this was starting to look like a dead end.
Preload all data
Trying to centralise the slow-down without using a big, dumb shared array of vocabulary meant that each test would just fetch its own set of vocabulary matching its predicate at application start-up. Of course, this took many times longer than the previous method… but I had a plan…
Use a disk cache
The plan was to use a disk cache so that the hit wasn’t repeated across application launches. I implemented this, adding NSCoding to my Core Data object, but it was slow. Really slow. Writing or reading the file to disk took many times longer than querying the database to generate that array for me dynamically.
This was going to require a bit more intelligence…
Shared memory cache
Now knowing that querying the database directly was faster than loading query results saved to disk, I went about creating a memory cache shared across the whole app. This is basically an NSMutableDictionary with some fancy-pants methods around it. When the database is queried with a particular predicate, that predicate is stored as a key in the dictionary and the results from the database as the key’s object.
This meant that if you ran queries (or “fetch requests”, if you will… still not 100% comfortable with Core Data terminology) from different objects in the application but with the same predicates, the first would take time but the others would return immediately.
This meant that out of my seven tests, only five had to query the database and the other two could just pull from the cache.
Eliminate redundancy in level checking
The predicates I was generating at first were a total mess. To check for all vocabulary in JLPT 1, it would look something like this:
level == 'JLPT 1' AND ( isKatakana == YES OR isHiragana == YES OR isKanji == YES OR isKanjiHiragana == YES OR isPrefix == YES OR isSuffix == YES OR isMiscellaneous == YES OR isUntestable == YES )
Out of all those properties beginning with “is”, every single object will have at least one set. Therefore, when the user has not disabled any category (i.e., checking for all vocabulary as above), we might as well just use
level == 'JLPT 1'
This meant that to check for everything in the database, rather than having that big mess above multiplied by 4 (as there are four levels), the following can be used instead:
level == 'JLPT 1' OR level == 'JLPT 2' OR level == 'JLPT 3' OR level == 'JLPT 4'
Right? Right… but wait a minute! When the user has not disabled any level the above can be simplified to:
TRUEPREDICATE
So when I was building a predicate to select everything in the database, it went from this
(level == 'JLPT 1' AND ( isKatakana == YES OR isHiragana == YES OR isKanji == YES OR isKanjiHiragana == YES OR isPrefix == YES OR isSuffix == YES OR isMiscellaneous == YES OR isUntestable == YES )) OR (level == 'JLPT 2' AND ( isKatakana == YES OR isHiragana == YES OR isKanji == YES OR isKanjiHiragana == YES OR isPrefix == YES OR isSuffix == YES OR isMiscellaneous == YES OR isUntestable == YES )) OR (level == 'JLPT 3' AND ( isKatakana == YES OR isHiragana == YES OR isKanji == YES OR isKanjiHiragana == YES OR isPrefix == YES OR isSuffix == YES OR isMiscellaneous == YES OR isUntestable == YES )) OR (level == 'JLPT 4' AND ( isKatakana == YES OR isHiragana == YES OR isKanji == YES OR isKanjiHiragana == YES OR isPrefix == YES OR isSuffix == YES OR isMiscellaneous == YES OR isUntestable == YES ))
to this
TRUEPREDICATE
This gave a ~300% speed increase. If a user deselects one category from each level, the speedup will be totally lost. However, I do not expect users to do this too much. The default settings have everything on, and I expect that users may wish to disable a full category (possibly the extremes?—too easy and/or too hard?).
Order operands by property name in logic operators
As I said earlier, by caching the query results against its predicate, I only had to run five of the seven queries. However, I noticed that a couple of these looked the same, except with the order of “sub-predicates” reversed. What I mean is that I saw
TEXT != nil AND reading != nil
together with
reading != nil AND TEXT != nil
These are obviously the same thing, logically, but as strings they are completely different. There is no way to prove predicates are logically equivalent on the iPhone so to solve this, I made sure that when I construct these predicates with logic operators such as AND and OR, I order the operands based on the property name’s alphabetic order. What that means is that either of the above will become
reading != nil AND TEXT != nil
This is because “reading” < “text”.
This reduced the number of queries from five down to three!
Skip != nil if the property is not optional
Finally, I noticed that many of my predicates contained
TEXT != nil
The text property is not “optional”. That’s the same as having a “not null” column in the normal database world. So why bother checking if it’s nil if it’s never going to be nil?
NSEntityDescription can provide an NSDictionary of NSPropertyDescriptions, which let you know if any property isOptional or not. I used this to eliminate these redundant tests.
And this brought the number of unique queries down from three to just two!!
Conclusions
By simplifying predicates to logically equivalent but more sensible versions where possible, I reduced loading time to a third with default settings.
By shuffling around operand ordering and removing redundant comparisons, less obviously logically-equivalent predicates were discovered: for the seven tests there is only actually two predicates required.
By using a shared memory cache there is the overhead of a few seconds at application startup, but this means that queries with equal (not equivalent) predicates are only run once. It’s up to the rest of the application to make sure that equivalent predicates end up being equal as much as possible.

お疲れさま。I’d like to see a PowerPC 10.4 compatible cocoa version of the app!
I’ll be stuck in Ireland for a good bit longer than you so it’s the least you can do for me!
I’ll give you the source and you can port it for yourself
Not sure how compatible some of the stuff is.
I know my beloved NSOperationQueue isn’t supported. I might give it a try if I have the time.
I’m only using that because I need to keep the main thread from blocking. There’s no requirement for it to run in parallel at all so you’d be free to just run the jobs one after another until they’re done, blocking user interaction.
It looks OK to me!
Looks good. Hopefully have an iPhone next month. Forgot to apply for the JPLT unfortunately, next year hopefully.
They might be running the JLPT bi-annually from next year so you could be doing it in June! Which level would you be doing? How long have you been studying it for?
Oh only really started this summer after a few attempts before. I would be doing the lowest level. Unfortunately I’m finding it hard to get any time for it with college work
. Is the €99 for the iPhone development license a yearly fee or once off?
Unfortunately it’s every year. Fortunately, it’s only for running code on a device and submitting to the App Store so you can download the SDK for free and try it out with the iPhone Simulator.
Ya, I don’t have my iPhone yet so might do that first because I have to learn Obj-C and I’ve never used the Cocoa framework before. Cheers for that. Good luck with the project. Can’t wait to see it.
Pending approval now. Should be through in about 2 or 3 more days. Watch this space.