New Data Platform and Catalogue

16-12-2020

- by Marrit van der Burgh

Our data management systems are currently undergoing a major transformation. At this moment, our data is structured in tables (i.e. datasets containing multiple participants and multiple results associated with a specific scientific theme such as “smoking” or “ECG”). This system has worked well enough in the past, but has some limitations that will become more and more problematic while our dataset is rapidly growing.

Latest news!

Note that from January 1st onwards, all incoming applications and amendments to on-going projects will be processed via the new system. We encourage you at that point to completely renew your ‘old’ dataset, including the amendment-variables, to benefit from the advantages described below. If you have an ‘old’ dataset in an active workspace, you may also renew it without an amendment – free of charge. Please contact our datamanagement to get this started (in 2021).

The new platform

We have thus designed, together with our new ICT-partner Trivento, a high-resolution data model in which our assessment results are organized individually in “data points” rather than thematic tables. Each data point is provided with three pieces of basic metadata:

  • WHO provided this data point (i.e. the basic personal traits of the participant)
  • WHEN was it provided (at what time point and in what context)
  • WHAT was being assessed (which survey question or what physical measurement)

New online data catalogue

The new data model will also be reflected in our new online data catalogue. Researchers may assemble their required dataset by selecting participant types (based on traits such as age, sex, availability of genotype), assessments (general assessments and/or specific additional assessments), and variables (thematically grouped in 12 sections and 199 subsections). Information about the scientific or procedural background of coherent data collections (“HOW”) is provided online in the new wiki and in FDI documents in the workspace.

The important benefits:

The new platform and catalogue will have several important benefits:

  • The catalogue (in combination with the continuously evolving wiki) provides much more detailed information about our data before you order a dataset. 
  • The catalogue lets you order by variable and provides assessment and participant filters, making your dataset much more manageable. 
  • The platform processes incoming (newly generated) and outgoing (ordered) datasets in a rapid and highly automated manner. As a result, newly collected datasets become available for use almost immediately and with a high standard of (meta)data quality.
  • The tables in released datasets are restructured in a more logical, clearly defined, easy to understand and future-proof manner. 
  • We have added important information, to each data order in a clear and organized manner.
     

Our colleague Trynke de Jong about the changes

We like feedback!

We would never have been able to design the new structure without YOUR feedback, your complaints, your compliments. Please keep them coming. Examples of feedback we welcome with open arms:

•    Mistakes such as typo’s, wrong or incomplete labels, or awkward translations
•    Illogical or confusing variable codes
•    Suggestions for sections or subsections (i.e. request to alter or split them, or change the location of a variable)
•    Additional information for our wiki, such as publications or background information
•    Any frustration or irritation you experienced in the ordering process or when analysing data

We continuously strive to perfect the system and therefore update the wiki and catalogue regularly. We appreciate you're feedback by mentioning in the standard evaluation form during your project. But you can also e-mail them at any given time to our datamanagement.

As mentioned earlier we are doing this togethers with Trivento. The new data catalogue is developed by our long-term partner Molgenis. Please stay tuned on this topic.