ArtNet is a semantic network of works of art created using SemNet.
It contains data collected from ČSFD.cz and DatabazeKnih.cz during may 2011, in the extent of

and millions of relationships. Entities in ArtNet are instances of WordNet "classes" (synsets).

Schema

The schema of the collected data is defined in the scrapers, which can be downloaded in the Scrapers section, and has roughly the following structure:

Film:
Actor/Director:
Book:
Short story:
Writer:

Configuration

This section contains the configuration files that were used to collect the data. These files are used by SemNet's JobRunner and processors.

job.xml
Required by JobRunner; defines the processor chain.
crawler.xml
Crawler configuration; defines hosts for crawling and their parameters.
bootstrap.list
The list of URLs used to bootstrap the crawling process.
wn_map.xml
Defines mappings between the terms of the temporary vocabulary of scrapers to the vocabulary of the built network (WordNet terms in this case).
sesame.properties
Configuration options for the connection to a Sesame store.
wordnet-hyponym.rdf
WordNet 2.0 hyponymy set was used to bootstrap the triple store, to serve as a class hierarchy.
wn_as_class_hierarchy.rdf
Contains two RDF statements which enable the use of WordNet as a class hierarchy.

Scrapers

Following two files are the scrapers which are basically wrappers for specific websites (CSFD.cz and DatabazeKnih.cz in this case). They depend on the SemNet, Sesame and HTMLCleaner libraries, so they can't be compiled alone and are included here only for illustration.

CSFDScraper.java
The scraper class for CSFD.cz. Contains scrapers for Actor/Director and Film.
DBKnih.java
The scraper class for DatabazeKnih.cz. Contains scrapers for Book, Short story and Writer.

The compiled files can be downloaded in the complete sample file which includes SemNet binaries, all dependencies and ArtNet configuration.

semnet_artnet_conf.zip (12 MB)

SemNet binaries, required libraries, sources, documentation, ArtNet configuration files