README.md

List of Folders


.
├── transcript_families
├── cdna_sequences
├── exonic_regions
├── gene_homologies
├── gene_sequences_alg_and_blocks
└── README.md

transcript_families

The transcript_families folder contains a .tar.gz file that groups all the computed transcript families by gene family. This includes more than 50,000 gene families from version 111 of Ensembl database.


|   .
└── families.tar.gz

cdna_sequence

The cdna_sequences folder contains a FASTA file for each species stored in TranscriptDB. These files summarize the cDNA sequences for each gene of the species.


|   ├── acanthochromis_polyacanthus
│   │   └── acanthochromis_polyacanthus.cdna.fasta.gz
│   ├── accipiter_nisus
│   │   └── accipiter_nisus.cdna.fasta.gz

exonic_regions

The folder contains a GFF3 file for each species stored in TranscriptDB. These files summarize the positions of exons that make up all the gene transcripts. Inside the folder, you will also find the format of the GFF3 files.


│   ├── bison_bison_bison
│   │   └── bison_bison_bison.gff3.gz
│   ├── bos_grunniens
│   │   └── bos_grunniens.gff3.gz

gene_homologies

The folder contains information for each species, detailing the different types of relationships among its genes within their respective gene families.


│   ├── canis_lupus_dingo
│   │   └── canis_lupus_dingo.json.gz
│   ├── canis_lupus_familiaris
│   │   └── canis_lupus_familiaris.json.gz

gene_sequences_alg_and_blocks

The folder contains, for each species stored in TranscriptDB, the decomposition of blocks computed for each gene in the species. It also includes the aligned sequences of each gene's transcribed sequences, using either MACSE or Kalign depending on the size of the gene family, all organized according to their gene families.


│   ├── serinus_canaria
│   │   ├── serinus_canaria.blocks_alg.fasta.gz
│   │   └── serinus_canaria.sequences_alg.fasta.gz
│   ├── seriola_dumerili
│   │   ├── seriola_dumerili.blocks_alg.fasta.gz
│   │   └── seriola_dumerili.sequences_alg.fasta.gz

TranscriptDB