WP4 Upload format and guidelines
< Back to Home
Uploading of datasets is kept user-friendly and relatively unconstrained, given the heterogeneity of the information. However, the following guidelines are provided.
- Currently, the system accepts plain text files or ZIP archives. It is recommended to provide the MD5 checksum of all uploaded files, in order to avoid errors during the data transfer.
- The genotype section is intended to store genotypic information, such as FASTA sequences, BAM alignments, VCF, SNP-Chip data and PLINK (MAP/PED) genotype data. Files need to be compressed and, when possible, indexed (i.e. BAM, VCF).
- The reference genome used for the analyses should be clearly defined and publicly available and updated to the most recent assembly. Binary format for PLINK data files are preferred.
- As the repository is meant to be an operational tool, data should be ready to be processed using publicly available software. Thus, data should not be uploaded using a proprietary format.
- Raw data from genotyping should be provided in text format, compressed and documented.
- The preferred format for phenotype information is CSV encoded with UTF8 or ASCII. Columns need to be clearly stated and described, preferably by describing column type (integer, string, boolean) and whether a limited set of values is allowed.