analysis_data sheet

Data about processing from raw sequences to the derived outputs, including software versions, processing parameters, reference database used. Often there is only one row for each type of molecular preparation that is sequenced.

Terms

Term Definition Required By
amplicon_sequenced If amplicon metabarcoding was performed, list amplicons separated by a |? Name MUST match value provided to amplicon_sequenced on prep_data sheet. If metabarcoding not performed, list "not applicable". Only used for internal data management. Recommended
ampliconSize The length of the amplicon in basepairs. Median? Recommended
trim_method Method for trimming, including version and parameters Recommended
cluster_method Approach/algorithm when defining OTUs or ASVs, include version and parameters separated by semicolons Converted to otu_class_appr for Dwc Recommended
pid_clustering Percent identity used when clustering "species-level" OTUs or ASVs. Converted to otu_class_appr for DwC Recommended
taxa_class_method Method for assigning taxonomy, including version and parameters separated by semicolons Converted to 'otu_seq_comp_appr' for DwC OBIS
taxa_ref_db Reference database used for taxonomic assignment Converted to 'otu_db' for DwC OBIS
code_repo Link to public repository where analysis code is archived Converted to identificationReferences for Dwc OBIS
sop Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences. A reference to a well documented protocol, e.g. using protocols.io Recommended
identificationReferences A list (concatenated and separated) of references (publication, global unique identifier, URI) used in the Identification. Recommended best practice is to separate the values in a list with space vertical bar space ( | ). OBIS
controls_used Provide number and types of controls or blanks used. Converted to eventRemarks for the sequencing library event Recommended
assembly_qual The assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling ≥ 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimated Recommended
assembly_software Tool(s) used for assembly, including version number and parameters Recommended
annot Tool used for annotation, or for cases where annotation was provided by a community jamboree or model organism database rather than by a specific submitter Recommended
number_contig Total number of contigs in the cleaned/submitted assembly that makes up a given genome, SAG, MAG, or UViG Recommended