ks - Test Tools

The KnowledgeStore includes a pair of tools for testing the data retrieval performances of a KnowledgeStore instance:

The ks-test-driver tool simulates a number of concurrent clients each sending a sequence of request mixes to a populated KnowledgeStore instance. A request mix is obtained by instantiating a predefined set of parametric requests to the SPARQL and/or CRUD endpoints of the KnowledgeStore with actual parameter values. The requests of a mix are submitted sequentially by each simulated client, as a real user would do, and their execution time as well as the total throughput are measured by the ks-test-driver tool.
The ks-test-generator tool produces the parameter values used to instantiate the request mixes used by the driver. This is done by evaluating a set of SPARQL queries against the KnowledgeStore instance under test. Each query returns all the admissible values for a subset of parameters. These values are then joined and the result is sampled to produce a configurable number of parameter tuples (each giving rise to a request mix)

These tools are shipped in the two archives ks-distribution-1.7.1-tools.tar.gz and ks-distribution-1.7.1-server.tar.gz. They require Java 8 and use gzip for compression/decompression; they have been tested only on Linux and Mac OS X (in principle they should work also on Windows after installing the Unix Tools package, but we do not maintain batch script for this platform). In the following we describe the tools and their usage in more details, starting from the test driver for convenience of exposition, although in practice you would need to use the test generator first. Note that some reference information is also available by invoking these tools with option --help.

Test Driver

The ks-test-driver tool accepts a single option -c that supplies the configuration file with all the settings necessary to run a test. The test consists of an optional warmup phase, where request mixes are submitted but their performances are not recorded, followed by a measurement phase where performances are actually recorded. The configuration file is a normal Java properties file as exemplified below, with comments describing the role of each property

# the following properties provide the information required to connect to the KS under test
test.url=http://localhost:8080/
test.username=ks
test.password=kspass

# this is the file with the parameter values produced by the ks-test-generator tool
test.data=parameters.tsv.gz

# this is the trace file to be written as a result of the test
test.out=output.tsv.gz

# the seed controls the selection of parameters in the supplied file;
# same seed = same request mixes submitted to the KS
test.seed=0

# the number of concurrent clients to simulate (a thread is allocated to each of them)
test.clients=16

# the maximum number of request mixes and the maximum time in seconds for the warmup phase
test.warmupmixes=10000
test.warmuptime=120

# the maximum number of request mixes and the maximum time in seconds for the actual test phase
test.testmixes=20000
test.testtime=900

# the timeout in ms for each request submitted by a simulated client (the request is aborted if the timeout expires)
test.timeout=30000

# the list of parametric requests enabled for the test;
# each name in the comma separated list must have a corresponding set of describing properties in the file
test.queries=sparql1,crud1,crud2

# the specification of a SPARQL request consists in a name.type property (value must be 'sparql') and a name.query
# property with the SPARQL query string; parameters in the query string are denoted with ${parameter_name} tokens
# that are replaced at run time with actual parameter values
sparql1.type=sparql
sparql1.query=\
    SELECT ?actor (COUNT(DISTINCT ?event) AS ?count) ?comment \
    WHERE { \
      ?event sem:hasActor ?actor . \
      ?g1 dct:source <http://dbpedia.org/> \
      GRAPH ?g1 { ?actor a ${actor_type} } \
      ?g2 dct:source <http://dbpedia.org/> \
      GRAPH ?g2 { \
        ?actor rdfs:label ?label . \
        ?label bif:contains ${actor_term} \
      } \
      OPTIONAL { ?actor rdfs:comment ?comment } \
    } \
    GROUP BY ?actor ?comment \
    ORDER BY desc(?count) \
    LIMIT 20

# the specification of a metadata lookup operation that retrieves the description of a resource with all its mention
# note the use of parameter ${resource} for mandatory property 'name.id'
crud1.type=lookupall
crud1.id=${resource}

# the specification of a download operation that retrieves the raw text of a resource
# note again the use of parameter ${resource} for mandatory property 'name.id'
crud2.type=download
crud2.id=${resource}

# more requests can be specified below; in order to use them in the test,
# you have to include their name in the value of property 'test.queries'

The execution of the ks-test-driver tool (with a more complete configuration file) produces an output like the one shown below, with a final table reporting all the relevant test metrics computed during the test execution:

$ ./ks-test-driver -c driver.properties
20:10:27.007(I) SUT: https://knowledgestore2.fbk.eu/nwr/wikinews/ (anonymous access)
20:10:27.009(I) 10000 mix(es), 30 s warmup; 20000 mix(es), 180 s test; 16 client(s)
20:10:27.017(I) Input schema: (event, event_year, mention, resource, event_type, event_term, actor, actor_related, actor_type, actor_term, actor_property)
20:10:33.093(I) Parsed /tmp/parameters.tsv.gz: 1000000 tuples (164744 tuple/s avg)
20:10:33.135(I) 14 queries enabled (16 defined): sparql1, sparql2, sparql3, sparql4, sparql5, sparql6, sparql7, sparql8, sparql9, sparql10, sparql11, sparql12, rmlookup, download
20:10:33.150(I) Output schema: 72 attributes
20:10:33.171(I) Test started
20:10:33.171(I) Warmup started (16 clients, 10000 mix(es), 14 queries/mix)
20:11:04.573(I) Completed 276 query mixes (10 mixes/s avg)
20:11:04.578(I) Warmup completed in 31394 ms (client time: 30080-31391 ms; client mixes: 15-19)
20:11:04.580(I) Measurement started (16 clients, 20000 mix(es), 14 queries/mix)
20:14:06.625(I) Completed 1758 query mixes (9 mixes/s avg)
20:14:06.626(I) Measurement completed in 182037 ms (client time: 180150-181968 ms; client mixes: 106-114)
20:14:06.626(I) Test completed in 213475 ms

               Executions        Result size [solutions, triples or bytes]                       Execution time [ms]                                             Total time [ms]        Rate
               Total   Error     Min      Q1      Q2      Q3     Max    Geom    Mean     Std     Min      Q1      Q2      Q3     Max    Geom    Mean     Std     Sum   Clock   Share    /Sec   /Hour
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sparql1         1758       0       1       1       2       5      20       3       5       6      13      86     191     289     960     148     202     138  354799   22325    0.12   78.75  283485
sparql2         1758       0       1       1       3       7      20       3       6       6      11      68     155     248     794     125     170     117  299702   18858    0.10   93.22  335603
sparql3         1758       0       1       1       2       3      20       2       3       2      14     167     239     290     696     201     230      99  404884   25477    0.14   69.00  248412
sparql4         1758       0       1       4      12      20      20       9      12       7      13      21      23      26     354      24      25      11   43792    2755    0.02  638.11 2297205
sparql5         1758       0       1       2       3       6      20       3       5       5      12      17      19      21      91      19      20       7   35184    2213    0.01  794.40 2859828
sparql6         1758       0       1      20      20      20      20      19      19       3      16      49     107     184     477      94     117      73  205461   12928    0.07  135.98  489542
sparql7         1758       0       1      20      20      20      20      17      19       4      15     168     238     289     588     201     229      96  402529   25328    0.14   69.41  249874
sparql8         1758       0       1      11      20      20      20      12      16       7      14      94     206     273     590     149     191     106  335295   21098    0.12   83.33  299972
sparql9         1758       0       4       5       7       9      91       7       8       6      11      22      24      27     260      25      26      11   46058    2898    0.02  606.63 2183851
sparql10        1758       0      19     100     209     657    7702     286     967    1954      11      17      22     104    1181      43     115     213  202636   12750    0.07  137.88  496376
sparql11        1758       0       1       2       5      14      20       5       8       7       9      13      15      17      98      15      16       8   28825    1813    0.01  969.66 3490789
sparql12        1758       0      11      96     305     398     851     196     267     163      11      19      31      70     205      37      48      35   84328    5306    0.03  331.32 1192763
rmlookup        1758       0      65    1575    2424    3793   16636    2090    2852    2023      15      58      83     119     389      80      93      52  164281   10337    0.06  170.07  612247
download        1758       0     263    1470    2124    3090   16468    2138    2470    1519      26     139     156     177     520     157     162      45  285171   17944    0.10   97.97  352697
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
query (avg)    24612       0       1       4      20      99   16636      24     475    1263       9      22      71     187    1181      67     118     121 2892945  182037    1.00  135.20  486732
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
query mix       1758                                                                             460    1339    1608    1887    3735    1585    1646     448 2892945  182037    1.00    9.66   34767

Test Generator

Also the ks-test-driver tool accepts a single option -c that supplies the configuration file with all the settings necessary to run the parameter generation process. The configuration file is a normal Java properties file as exemplified below, with comments describing the role of each property

# the following properties provide the information required to connect to the KS under test
test.url=http://localhost:8080/
test.username=ks
test.password=kspass

# the number of parameter tuples (i.e., request mixes) to generate
test.mixes=1000000

# the output file where to write generated tuples
test.out=data_1m.tsv.gz

# the names of the SPARQL SELECT queries to evaluate, each one producing the admissible values for a subset of parameters,
# which are then joined by the tool; the SELECT clause of each query specify the names of the parameters whose values are
# returned by the query, with parameters in common to queries previously listed in 'test.queries' coming first (this
# constraint is exploited by the generator tool to implement the join of query results more easily and efficiently)
test.queries=q1,q2

# each query must have two properties 'name.file' and 'name.query', the first providing the name of the file where to store
# data obtained by the query evaluation, the second specifying the query string; note that if the file already exists, its
# content is used as is and the query is not evaluated against the KnowledgeStore (this mechanism allows reusing the results
# of expensive queries)
q1.file=events.tsv.gz
q1.query=\
  SELECT ?event ?event_year ?mention ?resource \
  WHERE { \
    { \
      SELECT ?event (MIN(?y) AS ?event_year) \
      WHERE { \
        ?event sem:hasTime ?t . \
        ?t owltime:inDateTime ?d . \
        ?d owltime:year ?y . \
        FILTER EXISTS { \
          ?event sem:hasActor ?actor , ?actor2 . \
          ?actor a dbo:Person . \
          ?actor2 a dbo:Person . \
          FILTER (?actor != ?actor2) \
        } \
      } \
      GROUP BY ?event \
    } \
    { \
      SELECT ?event (SAMPLE(?m) AS ?mention) \
      WHERE { \
        ?event gaf:denotedBy ?m \
      } \
      GROUP BY ?event \
    } \
    BIND (IRI(STRBEFORE(STR(?mention), "#")) AS ?resource) \
  }

# other queries here, if necessary

The listing below shows an example of the output produced by the ks-test-generator tool.

$ ./ks-test-generator -c generator.properties
19:49:48.065(I) SUT: https://knowledgestore2.fbk.eu/nwr/wikinews/ (anonymous access)
19:49:48.070(I) 1000000 mix(es) to be written to /tmp/parameters.tsv.gz
19:49:48.077(I) 7 queries enabled (7 defined): q1, q2, q3, q4, q5, q6, q7
19:50:18.264(I) Evaluated query q1: 4797 tuples (793 tuple/s avg)
19:50:18.328(I) Parsed /tmp/events.tsv.gz (event, event_year, mention, resource): 4797 tuples (97897 tuple/s avg)
19:50:21.893(I) Evaluated query q2: 38153 tuples (13606 tuple/s avg)
19:50:22.042(I) Parsed /tmp/event_types.tsv.gz (event, event_type): 38153 tuples (264951 tuple/s avg)
19:50:23.315(I) Evaluated query q3: 8860 tuples (12549 tuple/s avg)
19:50:23.360(I) Parsed /tmp/event_terms.tsv.gz (event, event_term): 8860 tuples (227179 tuple/s avg)
19:50:24.664(I) Evaluated query q4: 15244 tuples (17623 tuple/s avg)
19:50:24.754(I) Parsed /tmp/event_actors.tsv.gz (event, actor, actor_related): 15244 tuples (175218 tuple/s avg)
19:50:25.886(I) Evaluated query q5: 5372 tuples (82646 tuple/s avg)
19:50:25.896(I) Parsed /tmp/actor_types.tsv.gz (actor, actor_type): 5372 tuples (671500 tuple/s avg)
19:50:27.706(I) Evaluated query q6: 4885 tuples (119146 tuple/s avg)
19:50:27.729(I) Parsed /tmp/actor_terms.tsv.gz (actor, actor_term): 4885 tuples (271388 tuple/s avg)
19:50:30.411(I) Evaluated query q7: 73491 tuples (35606 tuple/s avg)
19:50:30.591(I) Parsed /tmp/actor_properties.tsv.gz (actor, actor_property): 73491 tuples (417562 tuple/s avg)
19:50:30.611(I) Output schema: (event, event_year, mention, resource, event_type, event_term, actor, actor_related, actor_type, actor_term, actor_property)
19:50:37.111(I) Generated 1000000 tuples (154012 tuple/s avg)
19:50:37.112(I) Tuple generation statistics: 586546 attempts failed, 386651 duplicates
19:51:02.445(I) Written /tmp/parameters.tsv.gz (event, event_year, mention, resource, event_type, event_term, actor, actor_related, actor_type, actor_term, actor_property): 1000000 tuples (39513 tuple/s avg)

Usage example

You may try the test tools on the publicly available WikiNews KnowledgeStore instance, using this generator configuration and driver configuration. Instructions:

download and extract the archive ks-distribution-1.7.1-tools.tar.gz
execute /path/to/knowledgestore/bin/ks-test-generator -c /path/to/generator.properties to generate the parameters file parameters.tsv.gz
if necessary, move parameters.tsv.gz in the same directory of driver.properties (or modify the latter to point to the parameters file)
execute /path/to/knowledgestore/bin/ks-test-driver -c /path/to/driver.properties to run the test (30 s warmup, 180 s measurement) and obtain a table similar to the one reported in this page; you will also get a file traces.tsv.gz with detailed information on all the requests submitted to the KnowledgeStore