

Minimum percent identity of read pairs to consensus to l min_read_ani, -min_read_ani min_read_ani h, -help show this help message and exit d, -debug Make extra debugging output (default: False) This includes the locations of SNPs, the number of read pairs that passed filters (and other information) for each scaffold, the linkage between SNV pairs, ect. These include things like the overall coverage, breadth of coverage, average nucleotide identity (ANI) between the reads and the reference genome, and the expected breadth of coverage based on that true coverage.įinally, this information is stored as an IS_profile object. This is only done for pairs of SNPs that are both on at least MIN_SNP reads For each pair harboring a SNP, calculate the linkage of that SNP with other SNPs within that same pair.

Pairs must be above some minimum nucleotide identity (ANI) value.The read in the pair with the higher mapQ is used for the pair. With bowtie2, if the read maps equally well to two positions on the genome, its mapQ score will be set to 2. MapQ scores are confusing and how they’re calculated varies based on the mapping algorithm being used, but are meant to represent both the number of mismatches in the mapping and how unique that mapping is. So if pairs have a median insert size of 500bp, by default all pairs with insert sizes over 1500bp will be excluded. The maximum insert distance is a multiple of the median insert distance. The minimum insert distance can be set with a command line parameter. Pairs must be mapped in the proper orientation with an expected insert size.By including many (dereplicated) genomes in your bowtie2 index, you will be able to far more accurately filter out mismapped reads and reduce false positive SNPs.įor more information on this, see choosing_parameters Mapping to just one genome at a time is highly discouraged, because this encourages mismapped reads from other genomes to be recruited by this genome. The most important aspect of this workflow is to map to many genomes at once. Use inStrain genome_wide to calculate genome-level microdiveristy metrics for each originally binned genome.Create a bowtie2 index of the representative genomes from this dereplicated set and map reads to this set from each sample: Recommended software: Bowtie2.Dereplicate the entire set of genomes that you would like to profile (all genomes from all environments) at 97-99% identity, and filter out low quality genomes.Recommended software: Bowtie2 (for mapping), MetaBAT, CONCOCT, DasTOOL (for binning). Bin genomes out of each assembly using differential coverage binning.Recommended software: IDBA_UD, MEGAHIT, metaSPADES.

Assemble reads into contigs for each sample collected from the environment.The recommended workflow for running inStrain: Using a collection of genomes (recommended) ¶
