analysis_data sheet
Data about processing from raw sequences to the derived outputs, including software versions, processing parameters, reference database used. Often there is only one row for each type of molecular preparation that is sequenced.
Terms
| Term | Definition | Required By |
|---|---|---|
| amplicon_sequenced | If amplicon metabarcoding was performed, list amplicons separated by a |? Name MUST match value provided to amplicon_sequenced on prep_data sheet. If metabarcoding not performed, list "not applicable". Only used for internal data management. | Recommended |
| ampliconSize | The length of the amplicon in basepairs. Median? | Recommended |
| trim_method | Method for trimming, including version and parameters | Recommended |
| cluster_method | Approach/algorithm when defining OTUs or ASVs, include version and parameters separated by semicolons Converted to otu_class_appr for Dwc | Recommended |
| pid_clustering | Percent identity used when clustering "species-level" OTUs or ASVs. Converted to otu_class_appr for DwC | Recommended |
| taxa_class_method | Method for assigning taxonomy, including version and parameters separated by semicolons Converted to 'otu_seq_comp_appr' for DwC | OBIS |
| taxa_ref_db | Reference database used for taxonomic assignment Converted to 'otu_db' for DwC | OBIS |
| code_repo | Link to public repository where analysis code is archived Converted to identificationReferences for Dwc | OBIS |
| sop | Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences. A reference to a well documented protocol, e.g. using protocols.io | Recommended |
| identificationReferences | A list (concatenated and separated) of references (publication, global unique identifier, URI) used in the Identification. Recommended best practice is to separate the values in a list with space vertical bar space ( | ). | OBIS |
| controls_used | Provide number and types of controls or blanks used. Converted to eventRemarks for the sequencing library event | Recommended |
| assembly_qual | The assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling ≥ 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimated | Recommended |
| assembly_software | Tool(s) used for assembly, including version number and parameters | Recommended |
| annot | Tool used for annotation, or for cases where annotation was provided by a community jamboree or model organism database rather than by a specific submitter | Recommended |
| number_contig | Total number of contigs in the cleaned/submitted assembly that makes up a given genome, SAG, MAG, or UViG | Recommended |