datasetposted on 16.06.2020 by Dominic Maderazo
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Refer to Fig 1 "Classification" and Table 1
These files represent the genomic coordinates, that are not overlapping with UCSC exons, allocated to each cluster according to the k-means clustering.
They are in a .bed like format that follows this convention:
Chrom segStart segEnd gcContent conservationLevel
As they are, the files can be used to perform the binary classification with the enrichments set out in the paper. Results from this can then be used to estimate the performance of binary classifiers and identify an optimal combination.
Users wishing to do the de novo motif analysis and ontology association are required to remove the last 2 columns so that the files are suitable for Trawler and GREAT.
These files can also be uploaded on to UCSC after the final 2 columns have been deleted.