[Pellet-users] Slow inference/Restrict inference
Evren Sirin
evren at clarkparsia.com
Sun Feb 10 23:57:15 UTC 2008
On 2/5/08 5:14 PM, Alejandro Rodríguez González wrote:
> Hi again,
>
> I have a new question about this topic.
>
> I was making some tests and i saw that the reasoning of query types
> are very fast when the individuals are in the loaded ontology.
>
> If for example i create a new individual:
>
> Individual i1 = modelo.createIndividual(this.getOntologyURI() +
> "#PR_SYMS_A_B_C", modelo.getResource(this.getOntologyURI() +
> "#Consult")); // PR_SYMS_A_B_C
> i1.addProperty(hasSymp,
> modelo.getResource(getOntologyURI() + "#SYM_A"));
> i1.addProperty(hasSymp, modelo.getResource(getOntologyURI() +
> "#SYM_B"));
> i1.addProperty(hasSymp, modelo.getResource(getOntologyURI() +
> "#SYM_C"));
>
> And i try to get the inferenced classes, it takes aproximately 30
> seconds (Apart of prepare,classify and realize).
Do you realize again after the model is changed? When the model is
changed in any way previous realization results are completely discarded
and reasoning steps will be repeated again.
>
> The question is, this inference can be optimized?
In most cases, instance retrieval can be done efficiently without
classification or realization. There was a bug introduced in version
1.5.0 (and currently fixed in the SVN [1]) which slows down instance
retrieval if classification has not occurred. But if you just classify
after the model has changed and then directly query the model to
retrieve the instances it should work reasonably well. This way you can
skip the realize step which takes most of the time. Certain queries
(querying for direct instances, querying for types, etc. ) will still
trigger realization so the first such query will be rather slow.
Cheers,
Evren
[1] http://cvsdude.com/trac/clark-parsia/pellet-devel/ticket/77
> Thanks :-)
>
> Evren Sirin escribió:
>> On 2/4/08 2:54 PM, Alejandro Rodríguez González wrote:
>>> Hi Evren,
>>>
>>> In first place thanks for your answer, this project was starting to
>>> exasperate me.. :-)
>>>
>>> I was reading your email and the attached code carefully, and making
>>> some tests. In effect, the problem was with the query that i was
>>> making but.. i have a doubt.
>>>
>>> I was testing to make the query type without make prepare,classify
>>> and realize, and it take so many time.. i suppose that i will make
>>> prepare,classify and realize in order to make the querys. Its this
>>> true?
>>
>> Yes, if you call prepare/classify/realize explicitly upfront then
>> most of the reasoning will be done at that time and the subsequent
>> queries will be faster (provided that your queries correspond to what
>> has been cached during classification and realization which is why I
>> suggested not using queries with a null value in the predicate
>> position).
>>
>>>
>>> Because if i don't make this, when i make a query, pellet will make
>>> the prepare, classify and realize for me?
>>
>> Yes, depending on the type of the query classification and/or
>> realization will be triggered. This will cause the first query to be
>> very slow compared to subsequent queries.
>>
>>>
>>> I think that if i only need make prepare, classify and realize one
>>> time (when program starts for example), will not be a problem, its
>>> correct this approach?
>>
>> Yes, doing the reasoning upfront generally makes sense.
>>
>> Cheers,
>> Evren
>>
>>>
>>> Thanks!!
>>>
>>>
>>> Evren Sirin escribió:
>>>> Hi Alejandro,
>>>>
>>>> I think you need to change your query not your ontology to get
>>>> better performance (trying a subset of the ontology will no doubt
>>>> improve the performance but I don't think it is required).
>>>> Currently you are running the following query
>>>>
>>>> model.listStatements(i, null, (RDFNode) null);
>>>>
>>>> which would try to find all the types, property assertions, same as
>>>> and different from inferences regarding that individual. This is
>>>> going to take considerable time especially because querying same as
>>>> and different from assertions is generally slow. Repeating this for
>>>> all 600 individuals in the ontology will be quite slow.
>>>> If you just query types and property assertions things would be
>>>> much better. I modified your code as shown at the and of this
>>>> message and put explicit timing measurements to show which
>>>> operation is taking how long. The explicit calls to classify and
>>>> realize are not queried but just done in the code to time these two
>>>> operations separately. Results I get on my laptop are like this
>>>> (timings give in milliseconds):
>>>>
>>>> Read | 8573
>>>> Prepare | 1062
>>>> Classify | 924
>>>> Realize | 36002
>>>> QueryTypes | 17
>>>> QueryProperties | 6
>>>> QuerySames | 0
>>>> QueryDifferents | 32525
>>>> QueryAll | 31842
>>>>
>>>> The operations Read, Prepare, Classify and Realize are all one-time
>>>> operations that take a total of 46 seconds. QueryAll is the query
>>>> you were trying which takes 33sec. QueryTypes, QueryProperties,
>>>> QuerySames and QueryDifferents breaks up the query into four
>>>> disjoint queries (the union of the results to those four queries is
>>>> exactly the same set of results to QueryAll). As you can see it is
>>>> just querying differentFrom's that is taking all the time (even
>>>> though you set UNA to true Pellet tries all combinations of
>>>> individuals to see if they are different from or not). I would
>>>> think that just QueryTypes and QueryProperties is what you are
>>>> interested in and they take total of 23ms (two order of magnitude
>>>> faster than QueryAll).
>>>> Also note that the performance of QueryType and QueryProperties
>>>> will not be affected by UNA option. So the decision to use UNA
>>>> should be based on semantic considerations not performance results.
>>>>
>>>> Cheers,
>>>> Evren
>>>>
>>>>
>>>> private void createModelToLoadData() {
>>>> PelletOptions.USE_UNIQUE_NAME_ASSUMPTION = true;
>>>> Timers timers = new Timers();
>>>>
>>>> OntModel model = ModelFactory.createOntologyModel(
>>>> PelletReasonerFactory.THE_SPEC );
>>>> timers.startTimer("Read");
>>>> model.read( "http://www.jalojavier.es/humandisease.owl" );
>>>> timers.stopTimer("Read");
>>>> timers.startTimer("Prepare");
>>>> model.prepare();
>>>> timers.stopTimer("Prepare");
>>>>
>>>> timers.startTimer("Classify");
>>>> ((PelletInfGraph) model.getGraph()).getKB().classify();
>>>> timers.stopTimer("Classify");
>>>> timers.startTimer("Realize");
>>>> ((PelletInfGraph) model.getGraph()).getKB().realize();
>>>> timers.stopTimer("Realize");
>>>> Individual i1 = model.getIndividual(
>>>> "http://www.jalojavier.es/humandisease.owl" + "#PR_SYMS_A_B_C" );
>>>> // PR_SYMS_A_B_C
>>>>
>>>> int count = 0;
>>>> timers.startTimer("QueryTypes");
>>>> count += countStatements( model, i1, RDF.type, null );
>>>> timers.stopTimer("QueryTypes");
>>>> timers.startTimer("QueryProperties");
>>>> for( Iterator i = model.listOntProperties(); i.hasNext(); )
>>>> count += countStatements( model, i1, (Property)
>>>> i.next(), null );
>>>> timers.stopTimer("QueryProperties");
>>>> timers.startTimer("QuerySames");
>>>> count += countStatements( model, i1, OWL.sameAs, null );
>>>> timers.stopTimer("QuerySames");
>>>> timers.startTimer("QueryDifferents");
>>>> count += countStatements( model, i1, OWL.differentFrom, null );
>>>> timers.stopTimer("QueryDifferents");
>>>> System.out.println( "Count for the first 4 queries: "
>>>> + count );
>>>> timers.startTimer("QueryAll");
>>>> count = countStatements( model, i1, null, null );
>>>> timers.stopTimer("QueryAll");
>>>> System.out.println( "Count for the last query: " +
>>>> count );
>>>> timers.print( true, null );
>>>> }
>>>>
>>>> private int countStatements(OntModel m, Resource s, Property p,
>>>> RDFNode o) {
>>>> int c = 0;
>>>> for( StmtIterator i = m.listStatements( s, p, o );
>>>> i.hasNext(); ) {
>>>> i.nextStatement();
>>>> c++;
>>>> }
>>>> return c;
>>>> }
>>>>
>>>> On 1/31/08 2:25 PM, Alejandro Rodríguez González wrote:
>>>>> Hello,
>>>>>
>>>>> I ask a few days ago about how to restrict the inference domain in
>>>>> pellet but no one answered me so.. i will try to make again the
>>>>> question (maybe no one understand me, i don't know).
>>>>>
>>>>> I have an ontology ( http://www.jalojavier.es/humandisease.owl )
>>>>> and Jena+pellet code ( http://rafb.net/p/k1yHie11.html ) to make
>>>>> the inferences.
>>>>>
>>>>> The problem is that the inference that i try to make with the code
>>>>> mentioned take a lot of time (near to 180 seconds)..
>>>>>
>>>>> I make some test splitting the ontology into small parts, and i
>>>>> think that the problem is the number of instances of the ontology
>>>>> (now i have near to 600 individuals and 1500 classes)..
>>>>>
>>>>> I think that may be it's possible to optimize the inference speed
>>>>> making a restriction over the inference domain. I have "Diaseses",
>>>>> "Symptoms", and "Lab Test" superclasses that are involucrated in
>>>>> the inferences, but, the results of the inferences are only
>>>>> subclasses of "Diseases" (that only has near to 30 instances).
>>>>>
>>>>> It's possible say to pellet that only must search the results in
>>>>> this superclass and ignore the rest?
>>>>>
>>>>> Or any other solution that optimize the inference speed..
>>>>>
>>>>> Thanks.
>>>>> _______________________________________________
>>>>> Pellet-users mailing list
>>>>> Pellet-users at lists.owldl.com
>>>>> http://lists.owldl.com/mailman/listinfo/pellet-users
>>>>> _______________________________________________
>>>>>
>>>>> Sponsored by Clark & Parsia, LLC http://clarkparsia.com/
>>>>>
>>>>
>>>>
>>>
>>
>>
>
More information about the Pellet-users
mailing list