[Pellet-users] Slow inference/Restrict inference
Evren Sirin
evren at clarkparsia.com
Mon Feb 4 16:27:44 UTC 2008
Hi Alejandro,
I think you need to change your query not your ontology to get better performance (trying a subset of the ontology will no doubt improve the performance but I don't think it is required). Currently you are running the following query
model.listStatements(i, null, (RDFNode) null);
which would try to find all the types, property assertions, same as and different from inferences regarding that individual. This is going to take considerable time especially because querying same as and different from assertions is generally slow. Repeating this for all 600 individuals in the ontology will be quite slow.
If you just query types and property assertions things would be much better. I modified your code as shown at the and of this message and put explicit timing measurements to show which operation is taking how long. The explicit calls to classify and realize are not queried but just done in the code to time these two operations separately. Results I get on my laptop are like this (timings give in milliseconds):
Read | 8573
Prepare | 1062
Classify | 924
Realize | 36002
QueryTypes | 17
QueryProperties | 6
QuerySames | 0
QueryDifferents | 32525
QueryAll | 31842
The operations Read, Prepare, Classify and Realize are all one-time operations that take a total of 46 seconds. QueryAll is the query you were trying which takes 33sec. QueryTypes, QueryProperties, QuerySames and QueryDifferents breaks up the query into four disjoint queries (the union of the results to those four queries is exactly the same set of results to QueryAll). As you can see it is just querying differentFrom's that is taking all the time (even though you set UNA to true Pellet tries all combinations of individuals to see if they are different from or not). I would think that just QueryTypes and QueryProperties is what you are interested in and they take total of 23ms (two order of magnitude faster than QueryAll).
Also note that the performance of QueryType and QueryProperties will not be affected by UNA option. So the decision to use UNA should be based on semantic considerations not performance results.
Cheers,
Evren
private void createModelToLoadData() {
PelletOptions.USE_UNIQUE_NAME_ASSUMPTION = true;
Timers timers = new Timers();
OntModel model = ModelFactory.createOntologyModel(
PelletReasonerFactory.THE_SPEC );
timers.startTimer("Read");
model.read( "http://www.jalojavier.es/humandisease.owl" );
timers.stopTimer("Read");
timers.startTimer("Prepare");
model.prepare();
timers.stopTimer("Prepare");
timers.startTimer("Classify");
((PelletInfGraph) model.getGraph()).getKB().classify();
timers.stopTimer("Classify");
timers.startTimer("Realize");
((PelletInfGraph) model.getGraph()).getKB().realize();
timers.stopTimer("Realize");
Individual i1 = model.getIndividual(
"http://www.jalojavier.es/humandisease.owl" + "#PR_SYMS_A_B_C" ); //
PR_SYMS_A_B_C
int count = 0;
timers.startTimer("QueryTypes");
count += countStatements( model, i1, RDF.type, null );
timers.stopTimer("QueryTypes");
timers.startTimer("QueryProperties");
for( Iterator i = model.listOntProperties(); i.hasNext(); )
count += countStatements( model, i1, (Property) i.next(),
null );
timers.stopTimer("QueryProperties");
timers.startTimer("QuerySames");
count += countStatements( model, i1, OWL.sameAs, null );
timers.stopTimer("QuerySames");
timers.startTimer("QueryDifferents");
count += countStatements( model, i1, OWL.differentFrom, null );
timers.stopTimer("QueryDifferents");
System.out.println( "Count for the first 4 queries: " + count );
timers.startTimer("QueryAll");
count = countStatements( model, i1, null, null );
timers.stopTimer("QueryAll");
System.out.println( "Count for the last query: " + count );
timers.print( true, null );
}
private int countStatements(OntModel m, Resource s, Property p,
RDFNode o) {
int c = 0;
for( StmtIterator i = m.listStatements( s, p, o ); i.hasNext(); ) {
i.nextStatement();
c++;
}
return c;
}
On 1/31/08 2:25 PM, Alejandro Rodríguez González wrote:
> Hello,
>
> I ask a few days ago about how to restrict the inference domain in
> pellet but no one answered me so.. i will try to make again the question
> (maybe no one understand me, i don't know).
>
> I have an ontology ( http://www.jalojavier.es/humandisease.owl ) and
> Jena+pellet code ( http://rafb.net/p/k1yHie11.html ) to make the inferences.
>
> The problem is that the inference that i try to make with the code
> mentioned take a lot of time (near to 180 seconds)..
>
> I make some test splitting the ontology into small parts, and i think
> that the problem is the number of instances of the ontology (now i have
> near to 600 individuals and 1500 classes)..
>
> I think that may be it's possible to optimize the inference speed making
> a restriction over the inference domain. I have "Diaseses", "Symptoms",
> and "Lab Test" superclasses that are involucrated in the inferences,
> but, the results of the inferences are only subclasses of "Diseases"
> (that only has near to 30 instances).
>
> It's possible say to pellet that only must search the results in this
> superclass and ignore the rest?
>
> Or any other solution that optimize the inference speed..
>
> Thanks.
> _______________________________________________
> Pellet-users mailing list
> Pellet-users at lists.owldl.com
> http://lists.owldl.com/mailman/listinfo/pellet-users
> _______________________________________________
>
> Sponsored by Clark & Parsia, LLC http://clarkparsia.com/
>
More information about the Pellet-users
mailing list