Test Tools
The KnowledgeStore includes a pair of tools for testing the data retrieval performances of a KnowledgeStore instance:
- The ks-test-driver tool simulates a number of concurrent clients each sending a sequence of request mixes to a populated KnowledgeStore instance. A request mix is obtained by instantiating a predefined set of parametric requests to the SPARQL and/or CRUD endpoints of the KnowledgeStore with actual parameter values. The requests of a mix are submitted sequentially by each simulated client, as a real user would do, and their execution time as well as the total throughput are measured by the ks-test-driver tool.
- The ks-test-generator tool produces the parameter values used to instantiate the request mixes used by the driver. This is done by evaluating a set of SPARQL queries against the KnowledgeStore instance under test. Each query returns all the admissible values for a subset of parameters. These values are then joined and the result is sampled to produce a configurable number of parameter tuples (each giving rise to a request mix)
These tools are shipped in the two archives ks-distribution-1.7.1-tools.tar.gz and ks-distribution-1.7.1-server.tar.gz. They require Java 8 and use gzip for compression/decompression; they have been tested only on Linux and Mac OS X (in principle they should work also on Windows after installing the Unix Tools package, but we do not maintain batch script for this platform). In the following we describe the tools and their usage in more details, starting from the test driver for convenience of exposition, although in practice you would need to use the test generator first. Note that some reference information is also available by invoking these tools with option --help.
Test Driver
The ks-test-driver tool accepts a single option -c that supplies the configuration file with all the settings necessary to run a test. The test consists of an optional warmup phase, where request mixes are submitted but their performances are not recorded, followed by a measurement phase where performances are actually recorded. The configuration file is a normal Java properties file as exemplified below, with comments describing the role of each property
# the following properties provide the information required to connect to the KS under test test.url=http://localhost:8080/ test.username=ks test.password=kspass # this is the file with the parameter values produced by the ks-test-generator tool test.data=parameters.tsv.gz # this is the trace file to be written as a result of the test test.out=output.tsv.gz # the seed controls the selection of parameters in the supplied file; # same seed = same request mixes submitted to the KS test.seed=0 # the number of concurrent clients to simulate (a thread is allocated to each of them) test.clients=16 # the maximum number of request mixes and the maximum time in seconds for the warmup phase test.warmupmixes=10000 test.warmuptime=120 # the maximum number of request mixes and the maximum time in seconds for the actual test phase test.testmixes=20000 test.testtime=900 # the timeout in ms for each request submitted by a simulated client (the request is aborted if the timeout expires) test.timeout=30000 # the list of parametric requests enabled for the test; # each name in the comma separated list must have a corresponding set of describing properties in the file test.queries=sparql1,crud1,crud2 # the specification of a SPARQL request consists in a name.type property (value must be 'sparql') and a name.query # property with the SPARQL query string; parameters in the query string are denoted with ${parameter_name} tokens # that are replaced at run time with actual parameter values sparql1.type=sparql sparql1.query=\ SELECT ?actor (COUNT(DISTINCT ?event) AS ?count) ?comment \ WHERE { \ ?event sem:hasActor ?actor . \ ?g1 dct:source <http://dbpedia.org/> \ GRAPH ?g1 { ?actor a ${actor_type} } \ ?g2 dct:source <http://dbpedia.org/> \ GRAPH ?g2 { \ ?actor rdfs:label ?label . \ ?label bif:contains ${actor_term} \ } \ OPTIONAL { ?actor rdfs:comment ?comment } \ } \ GROUP BY ?actor ?comment \ ORDER BY desc(?count) \ LIMIT 20 # the specification of a metadata lookup operation that retrieves the description of a resource with all its mention # note the use of parameter ${resource} for mandatory property 'name.id' crud1.type=lookupall crud1.id=${resource} # the specification of a download operation that retrieves the raw text of a resource # note again the use of parameter ${resource} for mandatory property 'name.id' crud2.type=download crud2.id=${resource} # more requests can be specified below; in order to use them in the test, # you have to include their name in the value of property 'test.queries'
The execution of the ks-test-driver tool (with a more complete configuration file) produces an output like the one shown below, with a final table reporting all the relevant test metrics computed during the test execution:
$ ./ks-test-driver -c driver.properties 20:10:27.007(I) SUT: https://knowledgestore2.fbk.eu/nwr/wikinews/ (anonymous access) 20:10:27.009(I) 10000 mix(es), 30 s warmup; 20000 mix(es), 180 s test; 16 client(s) 20:10:27.017(I) Input schema: (event, event_year, mention, resource, event_type, event_term, actor, actor_related, actor_type, actor_term, actor_property) 20:10:33.093(I) Parsed /tmp/parameters.tsv.gz: 1000000 tuples (164744 tuple/s avg) 20:10:33.135(I) 14 queries enabled (16 defined): sparql1, sparql2, sparql3, sparql4, sparql5, sparql6, sparql7, sparql8, sparql9, sparql10, sparql11, sparql12, rmlookup, download 20:10:33.150(I) Output schema: 72 attributes 20:10:33.171(I) Test started 20:10:33.171(I) Warmup started (16 clients, 10000 mix(es), 14 queries/mix) 20:11:04.573(I) Completed 276 query mixes (10 mixes/s avg) 20:11:04.578(I) Warmup completed in 31394 ms (client time: 30080-31391 ms; client mixes: 15-19) 20:11:04.580(I) Measurement started (16 clients, 20000 mix(es), 14 queries/mix) 20:14:06.625(I) Completed 1758 query mixes (9 mixes/s avg) 20:14:06.626(I) Measurement completed in 182037 ms (client time: 180150-181968 ms; client mixes: 106-114) 20:14:06.626(I) Test completed in 213475 ms Executions Result size [solutions, triples or bytes] Execution time [ms] Total time [ms] Rate Total Error Min Q1 Q2 Q3 Max Geom Mean Std Min Q1 Q2 Q3 Max Geom Mean Std Sum Clock Share /Sec /Hour ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- sparql1 1758 0 1 1 2 5 20 3 5 6 13 86 191 289 960 148 202 138 354799 22325 0.12 78.75 283485 sparql2 1758 0 1 1 3 7 20 3 6 6 11 68 155 248 794 125 170 117 299702 18858 0.10 93.22 335603 sparql3 1758 0 1 1 2 3 20 2 3 2 14 167 239 290 696 201 230 99 404884 25477 0.14 69.00 248412 sparql4 1758 0 1 4 12 20 20 9 12 7 13 21 23 26 354 24 25 11 43792 2755 0.02 638.11 2297205 sparql5 1758 0 1 2 3 6 20 3 5 5 12 17 19 21 91 19 20 7 35184 2213 0.01 794.40 2859828 sparql6 1758 0 1 20 20 20 20 19 19 3 16 49 107 184 477 94 117 73 205461 12928 0.07 135.98 489542 sparql7 1758 0 1 20 20 20 20 17 19 4 15 168 238 289 588 201 229 96 402529 25328 0.14 69.41 249874 sparql8 1758 0 1 11 20 20 20 12 16 7 14 94 206 273 590 149 191 106 335295 21098 0.12 83.33 299972 sparql9 1758 0 4 5 7 9 91 7 8 6 11 22 24 27 260 25 26 11 46058 2898 0.02 606.63 2183851 sparql10 1758 0 19 100 209 657 7702 286 967 1954 11 17 22 104 1181 43 115 213 202636 12750 0.07 137.88 496376 sparql11 1758 0 1 2 5 14 20 5 8 7 9 13 15 17 98 15 16 8 28825 1813 0.01 969.66 3490789 sparql12 1758 0 11 96 305 398 851 196 267 163 11 19 31 70 205 37 48 35 84328 5306 0.03 331.32 1192763 rmlookup 1758 0 65 1575 2424 3793 16636 2090 2852 2023 15 58 83 119 389 80 93 52 164281 10337 0.06 170.07 612247 download 1758 0 263 1470 2124 3090 16468 2138 2470 1519 26 139 156 177 520 157 162 45 285171 17944 0.10 97.97 352697 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- query (avg) 24612 0 1 4 20 99 16636 24 475 1263 9 22 71 187 1181 67 118 121 2892945 182037 1.00 135.20 486732 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- query mix 1758 460 1339 1608 1887 3735 1585 1646 448 2892945 182037 1.00 9.66 34767
Test Generator
Also the ks-test-driver tool accepts a single option -c that supplies the configuration file with all the settings necessary to run the parameter generation process. The configuration file is a normal Java properties file as exemplified below, with comments describing the role of each property
# the following properties provide the information required to connect to the KS under test test.url=http://localhost:8080/ test.username=ks test.password=kspass # the number of parameter tuples (i.e., request mixes) to generate test.mixes=1000000 # the output file where to write generated tuples test.out=data_1m.tsv.gz # the names of the SPARQL SELECT queries to evaluate, each one producing the admissible values for a subset of parameters, # which are then joined by the tool; the SELECT clause of each query specify the names of the parameters whose values are # returned by the query, with parameters in common to queries previously listed in 'test.queries' coming first (this # constraint is exploited by the generator tool to implement the join of query results more easily and efficiently) test.queries=q1,q2 # each query must have two properties 'name.file' and 'name.query', the first providing the name of the file where to store # data obtained by the query evaluation, the second specifying the query string; note that if the file already exists, its # content is used as is and the query is not evaluated against the KnowledgeStore (this mechanism allows reusing the results # of expensive queries) q1.file=events.tsv.gz q1.query=\ SELECT ?event ?event_year ?mention ?resource \ WHERE { \ { \ SELECT ?event (MIN(?y) AS ?event_year) \ WHERE { \ ?event sem:hasTime ?t . \ ?t owltime:inDateTime ?d . \ ?d owltime:year ?y . \ FILTER EXISTS { \ ?event sem:hasActor ?actor , ?actor2 . \ ?actor a dbo:Person . \ ?actor2 a dbo:Person . \ FILTER (?actor != ?actor2) \ } \ } \ GROUP BY ?event \ } \ { \ SELECT ?event (SAMPLE(?m) AS ?mention) \ WHERE { \ ?event gaf:denotedBy ?m \ } \ GROUP BY ?event \ } \ BIND (IRI(STRBEFORE(STR(?mention), "#")) AS ?resource) \ } # other queries here, if necessary
The listing below shows an example of the output produced by the ks-test-generator tool.
$ ./ks-test-generator -c generator.properties 19:49:48.065(I) SUT: https://knowledgestore2.fbk.eu/nwr/wikinews/ (anonymous access) 19:49:48.070(I) 1000000 mix(es) to be written to /tmp/parameters.tsv.gz 19:49:48.077(I) 7 queries enabled (7 defined): q1, q2, q3, q4, q5, q6, q7 19:50:18.264(I) Evaluated query q1: 4797 tuples (793 tuple/s avg) 19:50:18.328(I) Parsed /tmp/events.tsv.gz (event, event_year, mention, resource): 4797 tuples (97897 tuple/s avg) 19:50:21.893(I) Evaluated query q2: 38153 tuples (13606 tuple/s avg) 19:50:22.042(I) Parsed /tmp/event_types.tsv.gz (event, event_type): 38153 tuples (264951 tuple/s avg) 19:50:23.315(I) Evaluated query q3: 8860 tuples (12549 tuple/s avg) 19:50:23.360(I) Parsed /tmp/event_terms.tsv.gz (event, event_term): 8860 tuples (227179 tuple/s avg) 19:50:24.664(I) Evaluated query q4: 15244 tuples (17623 tuple/s avg) 19:50:24.754(I) Parsed /tmp/event_actors.tsv.gz (event, actor, actor_related): 15244 tuples (175218 tuple/s avg) 19:50:25.886(I) Evaluated query q5: 5372 tuples (82646 tuple/s avg) 19:50:25.896(I) Parsed /tmp/actor_types.tsv.gz (actor, actor_type): 5372 tuples (671500 tuple/s avg) 19:50:27.706(I) Evaluated query q6: 4885 tuples (119146 tuple/s avg) 19:50:27.729(I) Parsed /tmp/actor_terms.tsv.gz (actor, actor_term): 4885 tuples (271388 tuple/s avg) 19:50:30.411(I) Evaluated query q7: 73491 tuples (35606 tuple/s avg) 19:50:30.591(I) Parsed /tmp/actor_properties.tsv.gz (actor, actor_property): 73491 tuples (417562 tuple/s avg) 19:50:30.611(I) Output schema: (event, event_year, mention, resource, event_type, event_term, actor, actor_related, actor_type, actor_term, actor_property) 19:50:37.111(I) Generated 1000000 tuples (154012 tuple/s avg) 19:50:37.112(I) Tuple generation statistics: 586546 attempts failed, 386651 duplicates 19:51:02.445(I) Written /tmp/parameters.tsv.gz (event, event_year, mention, resource, event_type, event_term, actor, actor_related, actor_type, actor_term, actor_property): 1000000 tuples (39513 tuple/s avg)
Usage example
You may try the test tools on the publicly available WikiNews KnowledgeStore instance, using this generator configuration and driver configuration. Instructions:
- download and extract the archive ks-distribution-1.7.1-tools.tar.gz
- execute /path/to/knowledgestore/bin/ks-test-generator -c /path/to/generator.properties to generate the parameters file parameters.tsv.gz
- if necessary, move parameters.tsv.gz in the same directory of driver.properties (or modify the latter to point to the parameters file)
- execute /path/to/knowledgestore/bin/ks-test-driver -c /path/to/driver.properties to run the test (30 s warmup, 180 s measurement) and obtain a table similar to the one reported in this page; you will also get a file traces.tsv.gz with detailed information on all the requests submitted to the KnowledgeStore