water_sample_data Sheet
Contextual data about the samples collected, such as when it was collected, where it was collected from, what kind of sample it is, and what were the properties of the environment or experimental condition from which the sample was taken. Each row is a distinct sample. Most of this information is recorded during sample collection. Many terms have controlled vocabulary, such as organism, env_broad_scale, waterBody. This file contains information that is submitted to NCBI when generating a BioSample. Other important fields for metadata processing include amplicon_sequenced, which helps to link together different types of metdata. This sheet contains terms from the MIMARKS survey water 6.0 package. For other types of samples (eg, sediment), use the appropriate template file.
| Term | definition | required_by |
|---|---|---|
| sample_name | Sample Name is a name that you choose for the sample. It can have any format, but we suggest that you make it concise, unique and consistent within your lab, and as informative as possible. Every Sample Name from a single Submitter must be unique. Suggested format: PROJECT_REGION_STATION_DEPTH_REPLICATE | NCBI+OBIS |
| organism | Often "seawater metagenome" or "sediment metagenome". Use "synthetic metagenome" for mock communities. The most descriptive organism name for this sample (to the species, if possible). It is OK to submit an organism name that is not in our database. In the case of a new species, provide the desired organism name, and our taxonomists may assign a provisional taxID. In the case of unidentified species, choose the appropriate Genus and include 'sp.', e.g., "Escherichia sp.". When sequencing a genome from a non-metagenomic source, include a strain or isolate name too, e.g., "Pseudomonas sp. UK4". For environmental or microbiome samples, used the metagenomes taxonomy (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169). More information about providing a valid organism, including new species, metagenomes (microbiomes) and metagenome-assembled genomes, see https://www.ncbi.nlm.nih.gov/biosample/docs/organism/. | NCBI+OBIS |
| collection_date | the date on which the sample was collected in UTC; date/time ranges are supported by providing two dates from among the supported value formats, delimited by a forward-slash character; collection times are supported by adding "T", then the hour and minute after the date, and must be in Coordinated Universal Time (UTC), otherwise known as "Zulu Time" (Z); supported formats include "DD-Mmm-YYYY", "Mmm-YYYY", "YYYY" or ISO 8601 standard "YYYY-mm-dd", "YYYY-mm", "YYYY-mm-ddThh:mm:ss"; e.g., 30-Oct-1990, Oct-1990, 1990, 1990-10-30, 1990-10, 21-Oct-1952/15-Feb-1953, 2015-10-11T17:53:03Z; valid non-ISO dates will be automatically transformed to ISO format | NCBI+OBIS |
| depth | Depth is defined as the vertical distance below surface. Depth can be reported as an interval for subsurface samples. Provide depth in meters, eg: "5 " {float} {unit} | NCBI+OBIS |
| env_broad_scale | Add terms that identify the major environment type(s) where your sample was collected. Recommend subclasses of biome [ENVO:00000428]. https://ontobee.org/ontology/ENVO?iri=http://purl.obolibrary.org/obo/ENVO_00000428Multiple terms can be separated by one or more pipes e.g.: mangrove biome [ENVO:01000181]|estuarine biome [ENVO:01000020] | NCBI+OBIS |
| env_local_scale | Add terms that identify environmental entities having causal influences upon the entity at time of sampling. Please use terms that are present in ENVO and which are of smaller spatial grain than your entry for env_broad_scale. multiple terms can be separated by pipes, e.g.: shoreline [ENVO:00000486]|intertidal zone [ENVO:00000316] | NCBI+OBIS |
| env_medium | Add terms that identify the material displaced by the entity at time of sampling. Recommend subclasses of environmental material [ENVO:00010483]. http://purl.obolibrary.org/obo/ENVO_00010483 Multiple terms can be separated by pipes e.g.: estuarine water [ENVO:01000301]|estuarine mud [ENVO:00002160] | NCBI+OBIS |
| geo_loc_name | Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg "Canada: Vancouver" or "Germany: halfway down Zugspitze, Alps" | NCBI+OBIS |
| lat_lon | The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format "d[d.dddd] N|S d[dd.dddd] W|E", eg, 38.98 N 77.11 W Formula: =CONCATENATE(AA19" N ",REPLACE(AB9,1,1,"")," E") | NCBI+OBIS |
| description | Description of the sample | Optional |
| serial_number | Specific to NOAA Omics, the serial number or unique id associated with the sample | Recommended |
| extract_number | Unique identifier of sample used for extraction and/or sequencing. Recommended format: Plate and well position of the DNA extraction. | Recommended |
| project_id | Internal short id for organizing projects. | Recommended |
| source_mat_id | A unique identifier assigned to a material sample (as defined by http://rs.tdwg.org/dwc/terms/materialSampleID). This identifier refers to the original material collected. We use it to identify CTD bottles or sediment trap cups, i.e., the source material before it was separated into biological or technical replicates. | Recommended |
| bioproject_accession | The accession number of the BioProject(s) to which the BioSample belongs. If the BioSample belongs to more than one BioProject, enter multiple bioproject_accession columns. A valid BioProject accession has prefix PRJN, PRJE or PRJD, e.g., PRJNA12345. | Recommended |
| biosample_accession | BioSample accession from NCBI, provided after creating a biosample on NCBI, such as during the SRA submission process | Recommended |
| amplicon_sequenced | If amplicon metabarcoding was performed, list amplicons separated by a |? Name MUST match value provided to amplicon_sequenced on prep_data sheet. If metabarcoding not performed, list "not applicable". Only used for internal data management. | Recommended |
| metagenome_sequenced | If metagenomic sequencing was performed, put Yes. If not performed, put "not applicable". Only used for internal data management. | Recommended |
| collection_date_local | The date on which the sample was collected in local time in ISO format; date/time ranges are supported by providing two dates from among the supported value formats, delimited by a forward-slash character; collection times are supported by adding "T", then the hour and minute after the date. | Recommended |
| waterBody | The name of the water body in which the dcterms:Location occurs. Recommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names. | Recommended |
| countryCode | Recommended best practice is to use an ISO 3166-1-alpha-2 country code. Recommended best practice is to use XZ if outside the EEZ (i.e. open seas). Lookup codes here: https://www.iso.org/obp/ui/#search | Recommended |
| decimalLatitude | The geographic latitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a Location. Positive values are north of the Equator, negative values are south of it. Legal values lie between -90 and 90, inclusive. | Recommended |
| decimalLongitude | The geographic longitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a Location. Positive values are east of the Greenwich Meridian, negative values are west of it. Legal values lie between -180 and 180, inclusive. | Recommended |
| geodeticDatum | spatial reference system (SRS) upon which the geographic coordinates given in. Oftewn WGS84. https://dwc.tdwg.org/terms/#dwc:geodeticDatum | Recommended |
| samp_vol_we_dna_ext | Volume (ml) or mass (g) of total collected sample processed for DNA extraction | Recommended |
| samp_collect_device | The device used to collect an environmental sample. This field accepts terms listed under environmental sampling device (http://purl.obolibrary.org/obo/ENVO). This field also accepts terms listed under specimen collection device (http://purl.obolibrary.org/obo/GENEPIO_0002094) | Recommended |
| samp_mat_process | Any processing applied to the sample during or after retrieving the sample from environment. | Recommended |
| dna_conc | Concentration of DNA (weight ng/volume µl) | Recommended |
| concentrationUnit | Unit used for concentration measurement | Recommended |
| sample_type | Type of sample, can be: seawater, sediment, or various types of blanks | Recommended |
| collection_method | The name of, reference to, or description of the method or protocol used during a sampling Event. https://dwc.tdwg.org/terms/#dwc:samplingProtocol | Recommended |
| basisOfRecord | The specific nature of the data record - a subtype of the dcterms:type. For DNA-derived occurrences, (see Category I and Category III) use MaterialSample. For enriched occurrences use PreservedSpecimen or LivingSpecimen as appropriate. https://docs.gbif-uat.org/publishing-dna-derived-data/1.0/en/#mapping-metabarcoding-edna-and-barcoding-data | Recommended |
| sample_replicate | Required if your samples are biological replicates from a single water sample. | Optional |
| cruise_id | Identifier for the cruise, with year in parentheses. | Optional |
| line_id | Standard in OAP cruise management. Refers to the cruise line. For cruises without lines, this can be one of several cohesive regions were samples were collected. | Optional |
| station | Station ID if used during the cruise. | Optional |
| ctd_bottle_no | This column is important if you have replicate subsamples taken from a single water sample, so as to link those subsamples together. This can correspond to a specific niskin collection event (specific bottle at specific time and specific depth), or whatever was recorded in the field. When matching with OAP data, this is the Sample_ID. | Optional |
| biological_replicates | The other biological replicates (sample names) paired with the sample. Separated by space |. | Optional |
| sample_title | Title of the Biosample when seen on NCBI. Suggest a short descriptive name. | Optional |
| notes_sampling | notes about the sample not covered by other metadata. Can be used internally or submitted to NCBI | Optional |
| size_frac | Filtering pore size used in sample preparation | Optional |
| alkalinity | alkalinity, the ability of a solution to neutralize acids to the equivalence point of carbonate or bicarbonate | Optional |
| alkalinity_method | Method used for alkalinity measurement | Optional |
| alkyl_diethers | concentration of alkyl diethers | Optional |
| altitude | The altitude of the sample is the vertical distance between Earth's surface above Sea Level and the sampled position in the air. | Optional |
| aminopept_act | measurement of aminopeptidase activity | Optional |
| ammonium | concentration of ammonium | Optional |
| atmospheric_data | measurement of atmospheric data; can include multiple data | Optional |
| bac_prod | bacterial production in the water column measured by isotope uptake | Optional |
| bac_resp | measurement of bacterial respiration in the water column | Optional |
| bacteria_carb_prod | measurement of bacterial carbon production | Optional |
| biomass | amount of biomass; should include the name for the part of biomass measured, e.g. microbial, total. can include multiple measurements | Optional |
| bishomomohopanol | concentration of bishomohopanol | Optional |
| bromide | concentration of bromide | Optional |
| calcium | concentration of calcium | Optional |
| carbo_nitro_ratio | ratio of amount or concentrations of carbon to nitrogen | Optional |
| chem_administration | list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603 | Optional |
| chloride | concentration of chloride | Optional |
| chlorophyll | concentration of chlorophyll | Optional |
| conduc | electrical conductivity of water | Optional |
| density | density of sample | Optional |
| diether_lipids | concentration of diether lipids; can include multiple types of diether lipids | Optional |
| diss_carb_dioxide | concentration of dissolved carbon dioxide | Optional |
| diss_hydrogen | concentration of dissolved hydrogen | Optional |
| diss_inorg_carb | dissolved inorganic carbon concentration | Optional |
| diss_inorg_nitro | concentration of dissolved inorganic nitrogen | Optional |
| diss_inorg_phosp | concentration of dissolved inorganic phosphorus | Optional |
| diss_org_carb | concentration of dissolved organic carbon | Optional |
| diss_org_nitro | dissolved organic nitrogen concentration measured as; total dissolved nitrogen - NH4 - NO3 - NO2 | Optional |
| diss_oxygen | concentration of dissolved oxygen | Optional |
| down_par | visible waveband radiance and irradiance measurements in the water column | Optional |
| elev | The elevation of the sampling site as measured by the vertical distance from mean sea level. | Optional |
| fluor | raw or converted fluorescence of water | Optional |
| glucosidase_act | measurement of glucosidase activity | Optional |
| isolation_source | Describes the physical, environmental and/or local geographical source of the biological sample from which the sample was derived. | Optional |
| light_intesnity | measurement of light intensity | Optional |
| magnesium | concentration of magnesium | Optional |
| mean_frict_vel | measurement of mean friction velocity | Optional |
| mean_peak_frict_vel | measurement of mean peak friction velocity | Optional |
| misc_param | any other measurement performed or parameter collected, that is not listed here | Optional |
| n_alkanes | concentration of n-alkanes; can include multiple n-alkanes | Optional |
| neg_cont_type | The substance or equipment used as a negative control in an investigation, e.g., distilled water, phosphate buffer, empty collection device, empty collection tube, DNA-free PCR mix, sterile swab, sterile syringe | Optional |
| nitrate | concentration of nitrate | Optional |
| nitrite | concentration of nitrite | Optional |
| nitro | concentration of nitrogen (total) | Optional |
| omics_observ_id | A unique identifier of the omics-enabled observatory (or comparable time series) your data derives from. This identifier should be provided by the OMICON ontology; if you require a new identifier for your time series, contact the ontology's developers. Information is available here: https://github.com/GLOMICON/omicon. This field is only applicable to records which derive from an omics time-series or observatory. | Optional |
| org_carb | concentration of organic carbon | Optional |
| org_matter | concentration of organic matter | Optional |
| org_nitro | concentration of organic nitrogen | Optional |
| organism_count | total count of any organism per gram or volume of sample, should include name of organism followed by count; can include multiple organism counts | Optional |
| oxy_stat_samp | oxygenation status of sample | Optional |
| part_org_carb | concentration of particulate organic carbon | Optional |
| part_org_nitro | concentration of particulate organic nitrogen | Optional |
| perturbation | type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types | Optional |
| petroleum_hydrocarb | concentration of petroleum hydrocarbon | Optional |
| pH | pH measurement | Optional |
| phaeopigments | concentration of phaeopigments; can include multiple phaeopigments | Optional |
| phosphate | concentration of phosphate | Optional |
| phosplipid_fatt_acid | concentration of phospholipid fatty acids; can include multiple values | Optional |
| photon_flux | measurement of photon flux | Optional |
| pos_cont_type | The substance, mixture, product, or apparatus used to verify that a process which is part of an investigation delivers a true positive | Optional |
| potassium | concentration of potassium | Optional |
| pressure | pressure to which the sample is subject, in atmospheres | Optional |
| primary_prod | measurement of primary production | Optional |
| redox_potential | redox potential, measured relative to a hydrogen cell, indicating oxidation or reduction potential | Optional |
| rel_to_oxygen | Is this organism an aerobe, anaerobe? Please note that aerobic and anaerobic are valid descriptors for microbial environments, eg, aerobe, anaerobe, facultative, microaerophilic, microanaerobe, obligate aerobe, obligate anaerobe, missing, not applicable, not collected, not provided, restricted access | Optional |
| salinity | salinity measurement | Optional |
| samp_store_dur | Duration for which the sample was stored. Indicate the duration for which the sample was stored written in ISO 8601 format | Optional |
| samp_store_loc | Location at which sample was stored, usually name of a specific freezer/room | Optional |
| samp_store_temp | Temperature at which sample was stored, e.g. -80 degree Celsius | Optional |
| silicate | concentration of silicate | Optional |
| size_frac_low | Refers to the mesh/pore size used to pre-filter/pre-sort the sample. Materials larger than the size threshold are excluded from the sample | Optional |
| size_frac_up | Refers to the mesh/pore size used to retain the sample. Materials smaller than the size threshold are excluded from the sample | Optional |
| sodium | sodium concentration | Optional |
| soluble_react_phosp | concentration of soluble reactive phosphorus | Optional |
| source_material_id | unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples. | Optional |
| sulfate | concentration of sulfate | Optional |
| sulfide | concentration of sulfide | Optional |
| suspend_part_matter | concentration of suspended particulate matter | Optional |
| temp | temperature of the sample at time of sampling | Optional |
| tidal_stage | stage of tide | Optional |
| tot_depth_water_col | measurement of total depth of water column | Optional |
| tot_diss_nitro | total dissolved nitrogen concentration, reported as nitrogen, measured by: total dissolved nitrogen = NH4 + NO3NO2 + dissolved organic nitrogen | Optional |
| tot_inorg_nitro | total inorganic nitrogen content | Optional |
| tot_nitro | total nitrogen content of the sample | Optional |
| tot_part_carb | total particulate carbon content | Optional |
| tot_phosp | total phosphorus concentration, calculated by: total phosphorus = total dissolved phosphorus + particulate phosphorus. Can also be measured without filtering, reported as phosphorus | Optional |
| turbidity | turbidity measurement | Optional |
| water_current | measurement of magnitude and direction of flow within a fluid | Optional |