A genome tag is a four character series of letters/numbers that forms the
beginning of the gene names for your project. The tag is also used
as a short name to identify the project itself.
An example of a genome tag is NCAS, which we used for the Naumovozyma
genome project. Genes from this genome have names like
NCAS0A01230 and NCAS0E02150.
A genome tag can be any combination of letters and/or numbers, but it must be
4 characters long.
Gene names are made from the genome tag as follows, for example the gene name
NCAS0E02150 consists of these parts:
- NCAS is the genome tag for the species Naumovozyma castellii.
- 0 after the tag is a mandatory character (reserved for possible future use).
- E indicates that this gene is on the 5th scaffold (E is the 5th letter of
the alphabet). If you choose the option to sort your scaffolds by
size, the letter A will be used for the largest scaffold, B for the
second-largest, etc, in descending order of size. If you do not choose
this option, the letters A, B, C, etc, will refer to the scaffolds in
the same order that they were found in your input Scaffolds file. If
there are more than 26 scaffolds in your Scaffolds file, YGAP will
use two letters to indicate the scaffold, e.g. NCAS0AG01230; the
maximum number of scaffolds allowed is 676 (= 26*26).
- 02150 is a 5-digit number that identifies the gene. Genes are numbered
sequentially, in increments of 10, from the beginning of each
scaffold. The numbers assigned by YGAP increase in increments of 10 to
allow room for future improvements of the annotation; for example if a
new gene is discovered between NCAS0E02150 and NCAS0E02160 in the
future it could be called NCAS0E02155.