I started out allowing a single triple match with a filter(regex()) to restrict results. And this worked rather well, making the first one free as they say. So, noticing the little white rabbit that seemed to disappear into the SPARQL bushes, I decided to join in the high tea and mercury sniffing that so induces sanity. Over the course of version 0.0.1 to 0.0.5 the SPARQL code is becoming better, little by little. The code is up at my sf.net page. But don't blame me if the your SPARQL is not implemented yet or your triples somehow disappear.
Anyway, here is a little benchmark session. I'm using the data set generator and queries found here. To make the data I use
$ cd /usr/local/java/bsbmtools
$ cat run.sh
#!/bin/bash
java -cp bin:lib/ssj.jar benchmark.generator.Generator "$@"
$ ./run.sh -fc -pc 1000 -s nt
$ mv dataset.nt thousand-prods.nt
$ mkdir -p /tmp/RDFBENCH
$ cd /tmp/RDFBENCH
$ mkdir mmap redland
Queries are run multiple times to ensure a hot disk cache. This is on a 3 disk RAID-5 and an Intel Q6600 with 8gb RAM.
The last query is not optimized properly in boostmmap yet, so its far slower than it rightly should be. For benchmarking the boostmmap backend...
$ cd /tmp/RDFBENCH/mmap
$ time sopranocmd --backend boostmmap \
--serialization ntriples \
import /usr/local/java/bsbmtools/thousand-prods.nt >|out 2>&1
real 1m49.642s
210M triples.mmap*
$ time sopranocmd \
--backend boostmmap \
list "" '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>' \
'<http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/product>' \
>| /tmp/out 2>&1
real 0m0.103s
grep Product /tmp/out | wc -l
1001
## based on Query 6
$ time sopranocmd \
--backend boostmmap query \
"
select ?what ?lab
where
{
?what http://www.w3.org/2000/01/rdf-schema#label ?lab .
filter( regex( str( ?lab ), 'excites' ))
}"
?lab -> <yawned%20excites%20deflower>;
?what -> <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/productfeature295>
?lab -> <goofs%20excites%20enigmata>;
?what -> <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/productfeature3276>
real 0m0.091s
$ time sopranocmd --backend boostmmap query \
"
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix xsd: <http://www.w3.org/2001/xmlschema#>
prefix dc: <http://purl.org/dc/elements/1.1/>
select ?offer ?price
where {
?offer bsbm:product http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/dataFromProducer1/Product5 .
?offer bsbm:vendor ?vendor .
?vendor bsbm:country http://downlode.org/rdf/iso-3166/countries#ES .
?offer dc:publisher ?vendor .
?offer bsbm:price ?price .
}"
0.93sec
Note that this 0.9seconds is shameful and needs to be optimized back to <0.1sec.
For redland,
$ cd /tmp/RDFBENCH/redland
$ time sopranocmd --backend redland \
--serialization ntriples \
import /usr/local/java/bsbmtools/thousand-prods.nt \
>|/tmp/out 2>&1
real 38m34.735s
480mb
$ time sopranocmd --backend redland \
list "" \
'<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>' \
'<http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/product>' \
>| /tmp/out 2>&1
real 0m0.096s
grep Product /tmp/out | wc -l
1000
So for just listStatements() redland and mmap are fairly equal in performance. Which, for a single indexed lookup, you might expect. In libferris I had restricted RDF usage to raw triple probes like this because I used redland directly prior to version 1.4.x of libferrris.
So for SPARQL,
## based on Query 6
$ time sopranocmd --backend redland query \
"
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?what ?lab
where
{
?what rdfs:label ?lab .
filter( regex( str( ?lab ), 'excites' ))
}"
what -> <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/productfeature295>;
lab -> "yawned excites deflower"
what -> <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/productfeature3276>;
lab -> "goofs excites enigmata"
real 0m3.855s
Gah, and I didn't slip up and put the 3 on the left side of the dot there. We are talking about 0.1 seconds for boostmmap against 3.86 seconds for redland.
$ time sopranocmd --backend redland query \
"
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix xsd: <http://www.w3.org/2001/xmlschema#>
prefix dc: <http://purl.org/dc/elements/1.1/>
select ?offer ?price
where {
?offer bsbm:product <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/datafromproducer1/product5>
?offer bsbm:price ?price .
?offer bsbm:vendor ?vendor .
?offer dc:publisher ?vendor .
?vendor bsbm:country <http://downlode.org/rdf/iso-3166/countries#es> .
}"
real 0m7.134s
Since this query doesn't work well on boostmmap it only goes from 1 to 7 seconds. But I think I can resolve it in much much less time than 1 second. This is not meant to make redland look bad, it's SPARQL implementation is much more complete than boostmmap will likely be any time soon. Creating an optimal query plan for the full SPARQL language will be an interesting challenge.
Development might be bursty as I don't know what time I can spare for improving the SPARQL completeness in the short term.