Friday, March 23, 2012

libferris in 512Mb RAM on arm5 at 1.2Ghz

I mentioned yesterday that I had started hacking my infix indexing optimization into the clucene index module of libferris. The short story is that on an aged ARM machine with fairly slow IO this optimization makes a huge difference to regex query on URLs. The numbers are below, this is on an index of ~/, /etc and /usr on the ARM5 machine running debian, about 100,000 odd files. Cold1 and hot1 are the same query executed against cold and hot caches (second run done just after the first). Naturally Cold2 and Hot2 are the same for a different query. Cold1 returns 22 results and Cold2 returns a single result.

old new
cold1 4.9 3.6
hot1 2.0 1.0
cold2 2.6 1.9
hot2 1.7 1.0

As can be seen, the optimization effects both hot and cold cache times which is quite handy as there are many times I use both, starting with a "-Z" evalutation to see how many matches there are and specializing or generalizing the query from there.

While the numbers may seem large, keep in mind that this is a slow arm running from an IO interface that needs some TLC. Using this indexing on an N9 or desktop machine will be much quicker even if there are 10 times the number of files indexed.

These arm deb builds will be up on sometime soon. I have to do a release of libferris with this optimization and the funky new JSON/REST support too.

1 comment:

monkeyiq said...

By infix query I mean looking for a URL with "vhost" anywhere in it. So you can't use a prefix index or a postfix (reversed prefix index) as you are looking "infix" for the query in the matches. vhost can appear in any of the directory names or the file name. It also can appear inside the file or dir name like in "kevhosting.txt".