[Pellet-users] Slow inference/Restrict inference

Alejandro Rodríguez González jalo.javier at gmail.com
Tue Feb 5 22:14:08 UTC 2008


Hi again,

I have a new question about this topic.

I was making some tests and i saw that the reasoning of query types are 
very fast when the individuals are in the loaded ontology.

If for example i create a new individual:

        Individual i1 = modelo.createIndividual(this.getOntologyURI() + 
"#PR_SYMS_A_B_C", modelo.getResource(this.getOntologyURI() + 
"#Consult")); // PR_SYMS_A_B_C
       
         i1.addProperty(hasSymp, modelo.getResource(getOntologyURI() + 
"#SYM_A"));
         i1.addProperty(hasSymp, modelo.getResource(getOntologyURI() + 
"#SYM_B"));
         i1.addProperty(hasSymp, modelo.getResource(getOntologyURI() + 
"#SYM_C"));

And i try to get the inferenced classes, it takes aproximately 30 
seconds (Apart of prepare,classify and realize).

The question is, this inference can be optimized? Thanks :-)
       

Evren Sirin escribió:
> On 2/4/08 2:54 PM, Alejandro Rodríguez González wrote:
>> Hi Evren,
>>
>> In first place thanks for your answer, this project was starting to 
>> exasperate me.. :-)
>>
>> I was reading your email and the attached code carefully, and making 
>> some tests. In effect, the problem was with the query that i was 
>> making but.. i have a doubt.
>>
>> I was testing to make the query type without make prepare,classify 
>> and realize, and it take so many time.. i suppose that i will make 
>> prepare,classify and realize in order to make the querys. Its this true?
>
> Yes, if you call prepare/classify/realize explicitly upfront then most 
> of the reasoning will be done at that time and the subsequent queries 
> will be faster (provided that your queries correspond to what has been 
> cached during classification and realization which is why I suggested 
> not using queries with a null value in the predicate position).
>
>>
>> Because if i don't make this, when i make a query, pellet will make 
>> the prepare, classify and realize for me?
>
> Yes, depending on the type of the query classification and/or 
> realization will be triggered. This will cause the first query to be 
> very slow compared to subsequent queries.
>
>>
>> I think that if i only need make prepare, classify and realize one 
>> time (when program starts for example), will not be a problem, its 
>> correct this approach?
>
> Yes, doing the reasoning upfront generally makes sense.
>
> Cheers,
> Evren
>
>>
>> Thanks!!
>>
>>
>> Evren Sirin escribió:
>>> Hi Alejandro,
>>>
>>> I think you need to change your query not your ontology to get 
>>> better performance (trying a subset of the ontology will no doubt 
>>> improve the performance but I don't think it is required). Currently 
>>> you are running the following query
>>>
>>> model.listStatements(i, null, (RDFNode) null);
>>>
>>> which would try to find all the types, property assertions, same as 
>>> and different from inferences regarding that individual. This is 
>>> going to take considerable time especially because querying same as 
>>> and different from assertions is generally slow. Repeating this for 
>>> all 600 individuals in the ontology will be quite slow.
>>> If you just query types and property assertions things would be much 
>>> better. I modified your code as shown at the and of this message and 
>>> put explicit timing measurements to show which operation is taking 
>>> how long. The explicit calls to classify and realize are not queried 
>>> but just done in the code to time these two operations separately. 
>>> Results I get on my laptop are like this (timings give in 
>>> milliseconds):
>>>
>>> Read            |      8573
>>> Prepare         |      1062
>>> Classify        |       924
>>> Realize         |     36002
>>> QueryTypes      |        17
>>> QueryProperties |         6
>>> QuerySames      |         0
>>> QueryDifferents |     32525
>>> QueryAll        |     31842
>>>
>>> The operations Read, Prepare, Classify and Realize are all one-time 
>>> operations that take a total of 46 seconds. QueryAll is the query 
>>> you were trying which takes 33sec. QueryTypes, QueryProperties, 
>>> QuerySames and QueryDifferents breaks up the query into four 
>>> disjoint queries (the union of the results to those four queries is 
>>> exactly the same set of results to QueryAll). As you can see it is 
>>> just querying differentFrom's that is taking all the time (even 
>>> though you set UNA to true Pellet tries all combinations of 
>>> individuals to see if they are different from or not). I would think 
>>> that just QueryTypes and QueryProperties is what you are interested 
>>> in and they take total of 23ms (two order of magnitude faster than 
>>> QueryAll).
>>> Also note that the performance of QueryType and QueryProperties will 
>>> not be affected by UNA option. So the decision to use UNA should be 
>>> based on semantic considerations not performance results.
>>>
>>> Cheers,
>>> Evren
>>>
>>>
>>>    private void createModelToLoadData() {
>>>        PelletOptions.USE_UNIQUE_NAME_ASSUMPTION = true;
>>>              Timers timers = new Timers();
>>>
>>>        OntModel model = ModelFactory.createOntologyModel( 
>>> PelletReasonerFactory.THE_SPEC );
>>>              timers.startTimer("Read");
>>>        model.read( "http://www.jalojavier.es/humandisease.owl" );
>>>        timers.stopTimer("Read");
>>>              timers.startTimer("Prepare");
>>>        model.prepare();
>>>        timers.stopTimer("Prepare");
>>>
>>>        timers.startTimer("Classify");
>>>        ((PelletInfGraph) model.getGraph()).getKB().classify();
>>>        timers.stopTimer("Classify");             
>>> timers.startTimer("Realize");
>>>        ((PelletInfGraph) model.getGraph()).getKB().realize();
>>>        timers.stopTimer("Realize");                          
>>> Individual i1 = model.getIndividual( 
>>> "http://www.jalojavier.es/humandisease.owl" + "#PR_SYMS_A_B_C" ); // 
>>> PR_SYMS_A_B_C
>>>
>>>        int count = 0;
>>>              timers.startTimer("QueryTypes");
>>>        count += countStatements( model, i1, RDF.type, null );
>>>        timers.stopTimer("QueryTypes");
>>>              timers.startTimer("QueryProperties");
>>>        for( Iterator i = model.listOntProperties(); i.hasNext(); )
>>>            count += countStatements( model, i1, (Property) i.next(), 
>>> null );
>>>        timers.stopTimer("QueryProperties");
>>>              timers.startTimer("QuerySames");
>>>        count += countStatements( model, i1, OWL.sameAs, null );
>>>        timers.stopTimer("QuerySames");
>>>              timers.startTimer("QueryDifferents");
>>>        count += countStatements( model, i1, OWL.differentFrom, null );
>>>        timers.stopTimer("QueryDifferents");
>>>              System.out.println( "Count for the first 4 queries: "  
>>> + count );
>>>              timers.startTimer("QueryAll");
>>>        count = countStatements( model, i1, null, null );
>>>        timers.stopTimer("QueryAll");
>>>              System.out.println( "Count for the last query: "  + 
>>> count );
>>>              timers.print( true, null );
>>>    }
>>>
>>>    private int countStatements(OntModel m, Resource s, Property p, 
>>> RDFNode o) {
>>>        int c = 0;
>>>        for( StmtIterator i = m.listStatements( s, p, o ); 
>>> i.hasNext(); ) {
>>>            i.nextStatement();
>>>            c++;
>>>        }
>>>        return c;
>>>    }
>>>
>>> On 1/31/08 2:25 PM, Alejandro Rodríguez González wrote:
>>>> Hello,
>>>>
>>>> I ask a few days ago about how to restrict the inference domain in 
>>>> pellet but no one answered me so.. i will try to make again the 
>>>> question (maybe no one understand me, i don't know).
>>>>
>>>> I have an ontology ( http://www.jalojavier.es/humandisease.owl ) 
>>>> and Jena+pellet code ( http://rafb.net/p/k1yHie11.html ) to make 
>>>> the inferences.
>>>>
>>>> The problem is that the inference that i try to make with the code 
>>>> mentioned take a lot of time (near to 180 seconds)..
>>>>
>>>> I make some test splitting the ontology into small parts, and i 
>>>> think that the problem is the number of instances of the ontology 
>>>> (now i have near to 600 individuals and 1500 classes)..
>>>>
>>>> I think that may be it's possible to optimize the inference speed 
>>>> making a restriction over the inference domain. I have "Diaseses", 
>>>> "Symptoms", and "Lab Test" superclasses that are involucrated in 
>>>> the inferences, but, the results of the inferences are only 
>>>> subclasses of "Diseases" (that only has near to 30 instances).
>>>>
>>>> It's possible say to pellet that only must search the results in 
>>>> this superclass and ignore the rest?
>>>>
>>>> Or any other solution that optimize the inference speed..
>>>>
>>>> Thanks.
>>>> _______________________________________________
>>>> Pellet-users mailing list
>>>> Pellet-users at lists.owldl.com
>>>> http://lists.owldl.com/mailman/listinfo/pellet-users
>>>> _______________________________________________
>>>>
>>>> Sponsored by Clark & Parsia, LLC http://clarkparsia.com/
>>>>   
>>>
>>>
>>
>
>



More information about the Pellet-users mailing list