Doctor of Philosophy
School of Chemistry
Barrett, Jeffrey R., Pragmatic protein domain identification, Doctor of Philosophy thesis, School of Chemistry, University of Wollongong, 2014. https://ro.uow.edu.au/theses/4141
Studying proteins is hard. Even in well studied model systems, some proteins are recalcitrant to production in useful amounts in soluble form. These proteins can be difficult to over-express and/or are not soluble and often the lack of protein solubility is blamed on poor/improper protein folding. However, in general, small proteins are easier to express in soluble form than large proteins.
Protein evolution has produced many modular multi-domain proteins that are made from several smaller folded domains in series, as it is more efficient to fuse established functional domains together than to construct a large protein de novo. It has been known for a long time that distinct domains of large proteins can be more easily over-expressed than the full-length protein, and many full-length proteins have been studied by over-expressing their domains separately.
This Thesis presents a new pragmatic methodology for truncating and identifying soluble fragments of proteins. This new technique was used to identify previously unattainable soluble domain constructs of proteins of interest to our research group. The new technique for protein domain truncation uses exonuclease III to delete a protein gene in a specially constructed plasmid. Gene deletion can be performed to result in truncation from either the amino- (N-) or carboxy- (C-) termini of a protein and makes a library of truncated protein genes. The truncated protein genes in these plasmids are fused to a downstream gene for either enhanced green fluorescent protein (EGFP) or dihydrofolate reductase (DHFR). The fused EGFP or DHFR gives cells expressing the fusion protein a distinct phenotype depending on whether the truncated protein-fusion is soluble or not. This technique allows pragmatic protein domain identification as soluble truncated proteins can be assumed to not include incomplete protein domains, and are thus truncated at a domain boundary.