This page provides instructions for querying of the ArtNet data.
There are two ways to access the data:
- download the RDF datasets from the Data page and use them with any RDF-enabled tool (triple stores, reasoners, …),
- download the full data set in Sesame native store format and use the querying tools in SemNet to access the data.
Setup
- Download and extract the full data set from the Data page (includes the binaries).
- The archive contains the following folders:
- data
- Contains the postgres DB dump of the state of the HTML crawler (collected URLs) and the Sesame native store files with all the collected and inferred data.
- doc
- Javadoc for SemNet and PipedObjectProcessor.
- proj
- Java sources, NetBeans projects and Ivy files for SemNet, ArtNetScrapers and PipedObjectProcessor.
- sample
- This folder is the main entry point to the program. It contains the binaries of SemNet and all the required libraries, configuration files for ArtNet and executable scripts (Windows batch files).
For use on systems other than Windows, the scripts can easily be adjusted by replacing the classpath separator character (";" on Windows) and the variable argument count placeholder ("%*" on Windows) with their respective counterparts.
Querying
In the sample directory, execute the query.bat. It takes several parameters:
where <conf> is a path to the configuration file of the repository (the triple store) and <operation> is one of the operations offered by the QueryInterface (which may optionally take some [args]. The operations are as follows:
- has <label>
- Returns true if the repository contains a resource with a rdfs:label equal to <label>.
- what <label>
- Returns a list of resources with the given <label>.
- type, fulltype <uri>
- These two commands both take a URI of a resource and return the rdf:types of the resource, either the direct type or the full type (the class hierarchy), respectively.
- describe <uri>
- Returns all statements containing <uri> as the subject or the object.
- count
- Returns the number of explicit (not inferred) statements currently present in the repository.
- tquery <query_file>
- Evaluates the SeRQL tuple query from the <query_file> and returns the resulting binding sets.
- gquery <query_file> <output_file>
- Evaluates the SeRQL graph query from the <query_file> and writes the resulting RDF graph to the <output_file>.
- rdfdump <dump_file>
- Dumps the contents of the repository to the <dump_file> in RDF format.
So, for example, to execute one of the sample SeRQL tuple queries located in sample/serql over the sample repository configured in sample/artnet (whose data are located in data/sesameData), the following command should be executed:
which would output all of the movies starring Bolek Polívka. The sample/serql includes more examples of tuple queries. Some examples of graph queries can be seen on the Data page. For more information on the SeRQL language, see the SeRQL reference.
indexes=psoc,opsc,spoc