WP3: Epiwork Epidemic Marketplace
The Epidemic Marketplace is an e-Science platform for collecting, storing, managing and epidemic semantically annotated data collections for epidemic modellers.
It is publicly available at the page: http://www.epimarketplace.net/
In recent years, the availability of a huge flow of quantitative social, demographic and behavioral data spurred the interest on innovative technologies to improve disease surveillance systems, providing faster and better geo-referenced outbreak detection capabilities. These capabilities depend on the availability of fine-tuned models, which require accurate and comprehensive data. However, the increasing amount of data brings in the problem of data integration and management. New solutions are needed to ensure that data are correctly stored, managed and made available to the scientific community. For instance, digital repository systems were developed to provide the framework for creation, management, and preservation of existing and evolving forms of digital content (http://www.fedora-commons.org). These systems are only effective if they 1) collect, preserve and provide data in multiple formats; 2) provide user access management features; 3) organize data according to multiple dimensions, including subject, relevance and accuracy; 4) support metadata annotation to describe the data; 5) involve the community in an active way.
To support these requirements, the members of Work Package 3 have developed an information platform to mediate access to distributed collections of public health data, offering an easy and safe way ti share data for those data providers who want to collaborate with epidemiological modellers.

The Epidemic Marketplace can be defined as a distributed virtual repository, a platform supporting transparent, seamless access to distributed, heterogeneous and redundant resources. It is a virtual repository because data can be stored in systems that are external to the Epidemic Marketplace, and it provides transparent access because several heterogeneities are hidden from its users. The Epidemic Marketplace is composed of a set of interconnected data management nodes geographically distributed, sharing common canonical data models, authorization infrastructure and access interfaces. Data can be either stored in one or more repositories or retrieved from external data sources using authorization credentials provided by clients. Data can also be replicated among repositories to improve access time, availability and fault tolerance. However, data replication is not mandatory; in several cases data must be stored in a single site due to, for instance, security constraints. It is worth noting, though, that any individual repository that composes the Marketplace will enable virtualized access to these data, once a user provides adequate security credentials.
An Epidemic Marketplace node has the following modules:
- Repository: stores epidemic data sets and an epidemic ontology to characterise the semantic information of the data sets.
- Mediator: a collection of web services that will provide access to internal data and external sources, based on a catalogue describing existing epidemic databases through their metadata using state-of-the-art semantic-web/grid technologies.
- Collector: retrieves information of real-time disease incidences from publicly available data sources, such as social networks; after retrieval, the collector groups the incidences by subject and creates data sets to store in the repository.
- Forum: allows users to post comments on integrated data from other modules, fostering collaboration among modellers;
Several open-source tools and open standards are being used in the Epidemic Marketplace implementation and deployment process. We selected Fedora Commons (http://www.fedora-commons.org) and Muradora (http://www.muradora.org/muradora) for the implementation of the main features of the repository.
A preliminary prototype of the Data Collector is now collecting data from Twitter on a daily basis. A new version of the Data Collector is being implemented, and this new version will have a graphical user interface and the capability of collecting data both actively and passively from multiples sources. The user will be able to dynamically configure new data collection processes thorough the graphical interface and a number of pre-defined services. The forum is currently available. It is implemented using phpBB and is integrated with other modules of the Epidemic Marketplace.
Lisbon is the first site where an Epidemic Marketplace node has been deployed. We envision near-future node deployments in the sites of our partners in Netherlands and Italy. The total number of Epidemic Marketplace nodes will depend on strategic decisions to be made by the Epiwork participants as the project evolves. It is worth noting that the epidemic Marketplace is able to handle data from any communicable disease, depending on the need of users.
Datasets
The repository already contains several resources added, mostly to demonstrate the repository functionality and the metadata schema. Among these resources are datasets, web resources such as sites containing relevant epidemiological information, references to Institutions working in the epidemiological and public health areas and even documents such as technical reports or scientific articles. At the moment, the access to this information requires registration and logging into to the system, which at the moment is only available to the Epiwork partners. Someone who is not registered can enter the repository and browse public collections but can not access data. The repository includes datasets from the Data Collector, which contains data collected from the Twitter. These datasets are composed from messages with references to diseases. It also contains other datasets such as datasets containing cumulative cases of H1N1 in Australia and a dataset of the US Air Transportation System.
Other documents stored in the repository include a document of H1N1 vaccine dose plans.
Other resources such as descriptions of Institutions or web sites, contain only metadata, describing those resources and their location.


