GLAMR Imports
This page provides an import overview.
Sample metadata
Spreadsheet
Information about samples in GLAMR is primarily collected in an excel spreadsheet primarily accessed via Google sheets. This sheet serves as the primary authority on sample IDs (see below) and is used to import data for processing by the GLAMR bioinformatics pipelines. It also serves as the primary record for most sample metadata such as collection time, location, and associated environmental measurements. Because of its critical central role, edit access is restricted to those who have been trained on GLAMR data entry.
GLAMR specific IDs
Bio-samples: IDs follow the format “bios_{numeric}”
- These are analogous to BioSamples in NCBI SRA and contain
Observations: IDs follow the format “samp_{numeric})”
- These are analogous to runs in NCBI SRA and are unique observations of a bios that must be associated with a bios_xxx ID. In most cases they are a unique sequencing effort, e.g. metagenome, amplicon target, or transcriptomic data. A single bio-sample may have multiple observations associated with it to account for different sequencing methodologies or primer sets. Observations can also represent results from other assays like metabolomics.
Sets / studies / projects: IDs follow the format “set_{numeric}”
- Bio-samples can be grouped into sets, typically associated with a particular project or paper.
Papers: IDs follow the format “paper_{numeric})”
- Whenever possible, samples are linked to associated papers.
Local Import
Content for importing local data goes here…
NCBI SRA Import
Content for importing from NCBI SRA goes here…
Directory Structure
Samples
GLAMR primarily follows a sample-centric workflow, and thus files are organized into sample directories. These are stored in data/omics/{sample_type}s/{SampleID}
Projects
To organize samples and facilitate easier browsing, sample directories are also symbolically linked into project folders using this file structure: data/projects/{sample_type}s/{SampleID}
Reference data
Reference data used by the GLAMR pipelines is stored in data/reference including:
Taxonomic annotation:
kraken
sourmash
Functional annotation:
koFamScan
UniRef100
Bakta
Bin QC and annotation:
CheckM
checkm2
GTDBtk
semibin
gunc
Specialized annotations:
deeparg
genomad
virsorter
AntiSmash
BiG-SCAPE
Custom BLAST queries
Per-sample outputs
Key pipeline outputs and their output locations can be found on the pages for the respective data types.