water_sample_data Sheet

Contextual data about the samples collected, such as when it was collected, where it was collected from, what kind of sample it is, and what were the properties of the environment or experimental condition from which the sample was taken. Each row is a distinct sample. Most of this information is recorded during sample collection. Many terms have controlled vocabulary, such as organism, env_broad_scale, waterBody. This file contains information that is submitted to NCBI when generating a BioSample. Other important fields for metadata processing include amplicon_sequenced, which helps to link together different types of metdata. This sheet contains terms from the MIMARKS survey water 6.0 package. For other types of samples (eg, sediment), use the appropriate template file.

Term definition required_by
sample_name Sample Name is a name that you choose for the sample. It can have any format, but we suggest that you make it concise, unique and consistent within your lab, and as informative as possible. Every Sample Name from a single Submitter must be unique. Suggested format: PROJECT_REGION_STATION_DEPTH_REPLICATE NCBI+OBIS
organism Often "seawater metagenome" or "sediment metagenome". Use "synthetic metagenome" for mock communities. The most descriptive organism name for this sample (to the species, if possible). It is OK to submit an organism name that is not in our database. In the case of a new species, provide the desired organism name, and our taxonomists may assign a provisional taxID. In the case of unidentified species, choose the appropriate Genus and include 'sp.', e.g., "Escherichia sp.". When sequencing a genome from a non-metagenomic source, include a strain or isolate name too, e.g., "Pseudomonas sp. UK4". For environmental or microbiome samples, used the metagenomes taxonomy (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169). More information about providing a valid organism, including new species, metagenomes (microbiomes) and metagenome-assembled genomes, see https://www.ncbi.nlm.nih.gov/biosample/docs/organism/. NCBI+OBIS
collection_date the date on which the sample was collected in UTC; date/time ranges are supported by providing two dates from among the supported value formats, delimited by a forward-slash character; collection times are supported by adding "T", then the hour and minute after the date, and must be in Coordinated Universal Time (UTC), otherwise known as "Zulu Time" (Z); supported formats include "DD-Mmm-YYYY", "Mmm-YYYY", "YYYY" or ISO 8601 standard "YYYY-mm-dd", "YYYY-mm", "YYYY-mm-ddThh:mm:ss"; e.g., 30-Oct-1990, Oct-1990, 1990, 1990-10-30, 1990-10, 21-Oct-1952/15-Feb-1953, 2015-10-11T17:53:03Z; valid non-ISO dates will be automatically transformed to ISO format NCBI+OBIS
depth Depth is defined as the vertical distance below surface. Depth can be reported as an interval for subsurface samples. Provide depth in meters, eg: "5 " {float} {unit} NCBI+OBIS
env_broad_scale Add terms that identify the major environment type(s) where your sample was collected. Recommend subclasses of biome [ENVO:00000428]. https://ontobee.org/ontology/ENVO?iri=http://purl.obolibrary.org/obo/ENVO_00000428Multiple terms can be separated by one or more pipes e.g.: mangrove biome [ENVO:01000181]|estuarine biome [ENVO:01000020] NCBI+OBIS
env_local_scale Add terms that identify environmental entities having causal influences upon the entity at time of sampling. Please use terms that are present in ENVO and which are of smaller spatial grain than your entry for env_broad_scale. multiple terms can be separated by pipes, e.g.: shoreline [ENVO:00000486]|intertidal zone [ENVO:00000316] NCBI+OBIS
env_medium Add terms that identify the material displaced by the entity at time of sampling. Recommend subclasses of environmental material [ENVO:00010483]. http://purl.obolibrary.org/obo/ENVO_00010483 Multiple terms can be separated by pipes e.g.: estuarine water [ENVO:01000301]|estuarine mud [ENVO:00002160] NCBI+OBIS
geo_loc_name Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg "Canada: Vancouver" or "Germany: halfway down Zugspitze, Alps" NCBI+OBIS
lat_lon The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format "d[d.dddd] N|S d[dd.dddd] W|E", eg, 38.98 N 77.11 W Formula: =CONCATENATE(AA19" N ",REPLACE(AB9,1,1,"")," E") NCBI+OBIS
description Description of the sample Optional
serial_number Specific to NOAA Omics, the serial number or unique id associated with the sample Recommended
extract_number Unique identifier of sample used for extraction and/or sequencing. Recommended format: Plate and well position of the DNA extraction. Recommended
project_id Internal short id for organizing projects. Recommended
source_mat_id A unique identifier assigned to a material sample (as defined by http://rs.tdwg.org/dwc/terms/materialSampleID). This identifier refers to the original material collected. We use it to identify CTD bottles or sediment trap cups, i.e., the source material before it was separated into biological or technical replicates. Recommended
bioproject_accession The accession number of the BioProject(s) to which the BioSample belongs. If the BioSample belongs to more than one BioProject, enter multiple bioproject_accession columns. A valid BioProject accession has prefix PRJN, PRJE or PRJD, e.g., PRJNA12345. Recommended
biosample_accession BioSample accession from NCBI, provided after creating a biosample on NCBI, such as during the SRA submission process Recommended
amplicon_sequenced If amplicon metabarcoding was performed, list amplicons separated by a |? Name MUST match value provided to amplicon_sequenced on prep_data sheet. If metabarcoding not performed, list "not applicable". Only used for internal data management. Recommended
metagenome_sequenced If metagenomic sequencing was performed, put Yes. If not performed, put "not applicable". Only used for internal data management. Recommended
collection_date_local The date on which the sample was collected in local time in ISO format; date/time ranges are supported by providing two dates from among the supported value formats, delimited by a forward-slash character; collection times are supported by adding "T", then the hour and minute after the date. Recommended
waterBody The name of the water body in which the dcterms:Location occurs. Recommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names. Recommended
countryCode Recommended best practice is to use an ISO 3166-1-alpha-2 country code. Recommended best practice is to use XZ if outside the EEZ (i.e. open seas). Lookup codes here: https://www.iso.org/obp/ui/#search Recommended
decimalLatitude The geographic latitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a Location. Positive values are north of the Equator, negative values are south of it. Legal values lie between -90 and 90, inclusive. Recommended
decimalLongitude The geographic longitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a Location. Positive values are east of the Greenwich Meridian, negative values are west of it. Legal values lie between -180 and 180, inclusive. Recommended
geodeticDatum spatial reference system (SRS) upon which the geographic coordinates given in. Oftewn WGS84. https://dwc.tdwg.org/terms/#dwc:geodeticDatum Recommended
samp_vol_we_dna_ext Volume (ml) or mass (g) of total collected sample processed for DNA extraction Recommended
samp_collect_device The device used to collect an environmental sample. This field accepts terms listed under environmental sampling device (http://purl.obolibrary.org/obo/ENVO). This field also accepts terms listed under specimen collection device (http://purl.obolibrary.org/obo/GENEPIO_0002094) Recommended
samp_mat_process Any processing applied to the sample during or after retrieving the sample from environment. Recommended
dna_conc Concentration of DNA (weight ng/volume µl) Recommended
concentrationUnit Unit used for concentration measurement Recommended
sample_type Type of sample, can be: seawater, sediment, or various types of blanks Recommended
collection_method The name of, reference to, or description of the method or protocol used during a sampling Event. https://dwc.tdwg.org/terms/#dwc:samplingProtocol Recommended
basisOfRecord The specific nature of the data record - a subtype of the dcterms:type. For DNA-derived occurrences, (see Category I and Category III) use MaterialSample. For enriched occurrences use PreservedSpecimen or LivingSpecimen as appropriate. https://docs.gbif-uat.org/publishing-dna-derived-data/1.0/en/#mapping-metabarcoding-edna-and-barcoding-data Recommended
sample_replicate Required if your samples are biological replicates from a single water sample. Optional
cruise_id Identifier for the cruise, with year in parentheses. Optional
line_id Standard in OAP cruise management. Refers to the cruise line. For cruises without lines, this can be one of several cohesive regions were samples were collected. Optional
station Station ID if used during the cruise. Optional
ctd_bottle_no This column is important if you have replicate subsamples taken from a single water sample, so as to link those subsamples together. This can correspond to a specific niskin collection event (specific bottle at specific time and specific depth), or whatever was recorded in the field. When matching with OAP data, this is the Sample_ID. Optional
biological_replicates The other biological replicates (sample names) paired with the sample. Separated by space |. Optional
sample_title Title of the Biosample when seen on NCBI. Suggest a short descriptive name. Optional
notes_sampling notes about the sample not covered by other metadata. Can be used internally or submitted to NCBI Optional
size_frac Filtering pore size used in sample preparation Optional
alkalinity alkalinity, the ability of a solution to neutralize acids to the equivalence point of carbonate or bicarbonate Optional
alkalinity_method Method used for alkalinity measurement Optional
alkyl_diethers concentration of alkyl diethers Optional
altitude The altitude of the sample is the vertical distance between Earth's surface above Sea Level and the sampled position in the air. Optional
aminopept_act measurement of aminopeptidase activity Optional
ammonium concentration of ammonium Optional
atmospheric_data measurement of atmospheric data; can include multiple data Optional
bac_prod bacterial production in the water column measured by isotope uptake Optional
bac_resp measurement of bacterial respiration in the water column Optional
bacteria_carb_prod measurement of bacterial carbon production Optional
biomass amount of biomass; should include the name for the part of biomass measured, e.g. microbial, total. can include multiple measurements Optional
bishomomohopanol concentration of bishomohopanol Optional
bromide concentration of bromide Optional
calcium concentration of calcium Optional
carbo_nitro_ratio ratio of amount or concentrations of carbon to nitrogen Optional
chem_administration list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603 Optional
chloride concentration of chloride Optional
chlorophyll concentration of chlorophyll Optional
conduc electrical conductivity of water Optional
density density of sample Optional
diether_lipids concentration of diether lipids; can include multiple types of diether lipids Optional
diss_carb_dioxide concentration of dissolved carbon dioxide Optional
diss_hydrogen concentration of dissolved hydrogen Optional
diss_inorg_carb dissolved inorganic carbon concentration Optional
diss_inorg_nitro concentration of dissolved inorganic nitrogen Optional
diss_inorg_phosp concentration of dissolved inorganic phosphorus Optional
diss_org_carb concentration of dissolved organic carbon Optional
diss_org_nitro dissolved organic nitrogen concentration measured as; total dissolved nitrogen - NH4 - NO3 - NO2 Optional
diss_oxygen concentration of dissolved oxygen Optional
down_par visible waveband radiance and irradiance measurements in the water column Optional
elev The elevation of the sampling site as measured by the vertical distance from mean sea level. Optional
fluor raw or converted fluorescence of water Optional
glucosidase_act measurement of glucosidase activity Optional
isolation_source Describes the physical, environmental and/or local geographical source of the biological sample from which the sample was derived. Optional
light_intesnity measurement of light intensity Optional
magnesium concentration of magnesium Optional
mean_frict_vel measurement of mean friction velocity Optional
mean_peak_frict_vel measurement of mean peak friction velocity Optional
misc_param any other measurement performed or parameter collected, that is not listed here Optional
n_alkanes concentration of n-alkanes; can include multiple n-alkanes Optional
neg_cont_type The substance or equipment used as a negative control in an investigation, e.g., distilled water, phosphate buffer, empty collection device, empty collection tube, DNA-free PCR mix, sterile swab, sterile syringe Optional
nitrate concentration of nitrate Optional
nitrite concentration of nitrite Optional
nitro concentration of nitrogen (total) Optional
omics_observ_id A unique identifier of the omics-enabled observatory (or comparable time series) your data derives from. This identifier should be provided by the OMICON ontology; if you require a new identifier for your time series, contact the ontology's developers. Information is available here: https://github.com/GLOMICON/omicon. This field is only applicable to records which derive from an omics time-series or observatory. Optional
org_carb concentration of organic carbon Optional
org_matter concentration of organic matter Optional
org_nitro concentration of organic nitrogen Optional
organism_count total count of any organism per gram or volume of sample, should include name of organism followed by count; can include multiple organism counts Optional
oxy_stat_samp oxygenation status of sample Optional
part_org_carb concentration of particulate organic carbon Optional
part_org_nitro concentration of particulate organic nitrogen Optional
perturbation type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types Optional
petroleum_hydrocarb concentration of petroleum hydrocarbon Optional
pH pH measurement Optional
phaeopigments concentration of phaeopigments; can include multiple phaeopigments Optional
phosphate concentration of phosphate Optional
phosplipid_fatt_acid concentration of phospholipid fatty acids; can include multiple values Optional
photon_flux measurement of photon flux Optional
pos_cont_type The substance, mixture, product, or apparatus used to verify that a process which is part of an investigation delivers a true positive Optional
potassium concentration of potassium Optional
pressure pressure to which the sample is subject, in atmospheres Optional
primary_prod measurement of primary production Optional
redox_potential redox potential, measured relative to a hydrogen cell, indicating oxidation or reduction potential Optional
rel_to_oxygen Is this organism an aerobe, anaerobe? Please note that aerobic and anaerobic are valid descriptors for microbial environments, eg, aerobe, anaerobe, facultative, microaerophilic, microanaerobe, obligate aerobe, obligate anaerobe, missing, not applicable, not collected, not provided, restricted access Optional
salinity salinity measurement Optional
samp_store_dur Duration for which the sample was stored. Indicate the duration for which the sample was stored written in ISO 8601 format Optional
samp_store_loc Location at which sample was stored, usually name of a specific freezer/room Optional
samp_store_temp Temperature at which sample was stored, e.g. -80 degree Celsius Optional
silicate concentration of silicate Optional
size_frac_low Refers to the mesh/pore size used to pre-filter/pre-sort the sample. Materials larger than the size threshold are excluded from the sample Optional
size_frac_up Refers to the mesh/pore size used to retain the sample. Materials smaller than the size threshold are excluded from the sample Optional
sodium sodium concentration Optional
soluble_react_phosp concentration of soluble reactive phosphorus Optional
source_material_id unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples. Optional
sulfate concentration of sulfate Optional
sulfide concentration of sulfide Optional
suspend_part_matter concentration of suspended particulate matter Optional
temp temperature of the sample at time of sampling Optional
tidal_stage stage of tide Optional
tot_depth_water_col measurement of total depth of water column Optional
tot_diss_nitro total dissolved nitrogen concentration, reported as nitrogen, measured by: total dissolved nitrogen = NH4 + NO3NO2 + dissolved organic nitrogen Optional
tot_inorg_nitro total inorganic nitrogen content Optional
tot_nitro total nitrogen content of the sample Optional
tot_part_carb total particulate carbon content Optional
tot_phosp total phosphorus concentration, calculated by: total phosphorus = total dissolved phosphorus + particulate phosphorus. Can also be measured without filtering, reported as phosphorus Optional
turbidity turbidity measurement Optional
water_current measurement of magnitude and direction of flow within a fluid Optional