Those who study cancer generally agree that at its most basic level, cancer is a genetic disease. This is not to mean that cancer is inherited, because although several syndromes of familial cancer susceptibility exist, most cancers occur sporadically, and therefore are not inherited in the true sense of the word. Rather, this statement implies that the biological events that initiate the malignancy represent alterations in the expression of genes. Furthermore, as cancers progress, alterations in gene expression tend to become more profound and disruptions of the chromosomes, which govern gene expression, often become quite extreme. Thus, cancer biologists are aggressively pursuing genetic studies of cancer tissues in order to identify the master genes controlling individual malignancies. As tumors become more genetically disorganized as they progress, it is critical to separate the "wheat from the chaff". In other words, one needs to ignore those genetic abnormalities that happen to be present but are not functionally critical, and to identify those critical genes which cause and/or maintain the health and viability of the cancer cell.
Although sarcomas are often grouped together on the basis of their common ancestry from primitive mesenchymal progenitors (Figure 1) and their common clinical behavior, they are at the same time a very heterogeneous group of tumors.
Not surprisingly therefore, genetic studies thus far have confirmed that while the inner workings of sarcomas with common appearance under the microscope are similar, the inner workings of the different histologic subtypes of sarcoma are highly variable. For many sarcomas, unique chromosomal translocations serve as the driving force for oncogenesis, with a different translocation and a different path by which the translocation causes cancer in each histologic subtype. For other sarcomas, complex and widespread genetic disorganization is evident in the tumor cells, and therefore one cannot even begin to guess what might be the critical genetic event that gave rise to the tumor. Coming to grips with and delineating the genetic distinctions between individual sarcoma subtypes and identifying those genes critical for some or all sarcomas is essential for new targeted therapies to be developed for these diseases.
Historically, the genetic study of tumors (a field called molecular oncology), was largely conducted by individuals or small groups which generated and tested hypotheses regarding the importance of individual genes of interest, chosen on the basis of their known mechanisms of action. Such individual genes would typically be studied at length, one by one, using approaches that allowed investigators to identify whether the gene was present, absent, increased or decreased in various conditions. Although this approach has resulted in progress in cancer biology, it is difficult to study each individual sarcoma subtype with such labor intensive techniques. Furthermore, because there are an estimated 100,000 genes in the human repertoire, it is clear that the one-by-one approach is not able to efficiently "mine the genome" for what might be a new or unexpected gene that could play an important role.
Over the last several years, a variety of "high throughput" approaches have become available which can provide a comprehensive overview of gene expression in any given cell. Together, these technologies seek to profile gene expression within the cells and the various technologies can be referred to as "expression profiling". Depending upon the exact methodology used, it may be called DNA microarray, or GeneChip analysis. In general, these technologies are complex but highly reliable, and remarkably, they can be performed with very, very small amounts of tissue obtained from a routine clinical biopsy. Not surprisingly therefore, expression profiling is a technique now commonly used by molecular oncologists to study sarcomas, in an effort to jump-start our understanding of the biology of these complex, rare and life-threatening tumors for which all too often, effective therapy is not available. In order to understand this technology, one must first have a complete understanding of the basics of gene expression.
The Basics of Gene Expression
Gene Expression begins with unraveling of double stranded DNA to expose a single strand of DNA which encodes an exon. By sequential pairing of nucleotides along the exposed exon, a new mRNA strand is created which is complementary in base pair sequence to the original sequence in the DNA. The mRNA travels to the cytoplasm where it is attached to the ribosome. There protein synthesis takes place by sequentially attaching together amino acids encoded for by the sequence of the mRNA.
DNA: deoxyribonucleic acid, double stranded archive of the entire genome, contains introns (non-coding elements of the genome) and exons (elements which serve as coding regions for gene expression)
RNA: ribonucleic acid, three types:
- tRNA: transfer RNA, plays a role in making proteins
- rRNA: ribosomal RNA, makes up ribosomes where translation takes place
- mRNA: messenger RNA, contains the coding sequence which determines which proteins are made, the template for translation
Transcription: transfer of genetic information from DNA to messenger RNA through a process of DNA unwinding and the creation of new mRNA
Translation: Taking the nucleotide code which is denoted by the sequence of the mRNA and creating the appropriately sequenced amino acids which come together to form a protein, takes place on the ribosomes.
DNA microarray analysis starts with a glass slide, upon which small stretches of DNA are affixed. Remarkably, stretches from up to 20,000 or so genes can be imprinted onto one individual slide. They are typically arranged neatly in rows so that each dot representing individual genes can be identified according to its exact coordinates in the grid. The DNAs which are chosen by the investigator to be placed upon the slide represent portions of genes which are of potential interest in cancer biology, as well as some genetic stretches which have been found in tumors, but for which the corresponding gene has not been identified. The investigator then takes the tumor of interest and extracts the RNA from the tumor using standard techniques. It is anticipated that any of the genes, which were expressed by the tumor at the time the biopsy was taken, would have RNA present in the sample. RNA tends to be unstable and therefore a reaction is performed to generate the more stable complementary DNA (cDNA) from the RNA. This is a highly standardized and commonly used approach in molecular biology and essentially allows for a stable sample of all expressed genes present in the starting sample to be generated. The cDNA is then labeled with a fluorescent dye and applied, in solution, onto the slide. If cDNA is present which is complementary to the DNA sequence for individual genes present on the slide, binding will take place. The slide is then washed and those cDNAs which did not bind are removed. A laser then scans the slide and any bound cDNA will appear green. In this way, one can get a very broad view of the genes which are expressed in a give tumor.
Although the use of cDNA from one tumor sample labeled with one color can give some information, it does not give the investigator reliable information about how this gene expression would compare to a normal cell, or to another type of cancer cell. Therefore, in order to obtain more information of gene expression relative to some control sample, investigators most commonly simultaneously apply a second source of cDNA to the slide. This could represent RNA from other tumors that appear similar when evaluated under the microscope in order to compare how similar the cells are genetically. This could also represent cDNA from normal cells that represent the tissue of origin for the individual tumor (for instance, the use of cDNA from normal muscle when performing a microarray analysis with rhabdomyosarcoma samples). The second cDNA source might also be derived from a tumor which is histologically the same, but has differing clinical characteristics such as more aggressive clinical behavior. Whatever the choice of the second cell type is, the handling of it is essentially the same as that described above with one exception. The second cDNA is labeled with a red dye and therefore will show as a red spot after laser scanning. The computer then merges the images showing red spots with that showing green spots, which results in one image which can identify genes which are expressed in both (looks yellow due to the merging of red and green), neither (black), or over/under-expressed in one versus the other (red or green). A theoretical example of how such a microarray looks is shown in Figure 2a, whereas an example of real data is shown in Figure 2b.
Although the principles involved in microarray analyses are simple and represent very straightforward applications of our current understanding of molecular biology, the technology required to accurately perform microarray analysis and to interpret the results is highly complex. Currently there are two basic types of microarray used—one involves the use of DNA imprinted slides as described above. A second, developed by the Affymetrix company, called GeneChip uses oligonucleotides rather than DNA on the physical surface to which the cDNA binds and the application of red vs. green cDNA occurs across slides rather than within one individual slide. There are multiple other commercially available kits for microarray, and each requires careful approach so that appropriate controls are used to validate the results.
The basics of microarray technology: This animated tutorial helps one to visualize the mechanics of this remarkably simple yet powerful process. It was made by A. Malcolm Campbell in the Department of Biology at the Davidson College.
Affymetrix GeneChip Array provides a Data Mining Tool Tutorial.
Using these techniques, much has been learned about gene expression in sarcomas. A few of the results will be highlighted here, but the interested reader is referred to a recent review by Greer et al. for a more detailed discussion.
- Khan and colleagues used microarray to compare gene expression across a variety of pediatric tumors which appear very similar under the microscope (small round blue cell tumors) but which are known to have distinct derivations and distinct clinical behaviors. Using microarray and complex approaches involving "artificial neural networks" which teach a computer how to sift through masses of data for critical elements, the computerized algorithm identified 96 genes which were able to allow correct classification of these tumors 100% of the time. This provided the important proof-of-principle that tumors which are clinically classified as an individual group show their own distinct patterns of gene expression, again confirming cancer as a genetic disease. Of course, one hopes that identification of such distinct patterns of genetic expression is the first step toward identifying critical genes which could be targeted with new drugs or other therapeutics.
- Nagayama, studied gene expression in 13 synovial sarcomas and 34 other spindle-cell sarcomas. By comparing the similarity of gene expression through a technique called "hierarchical clustering analysis", it was found that synovial sarcomas clustered closely to malignant peripheral nerve sheath tumors. This is the first data to suggest that these two tumor types, which appear to be derived from different mesenchymal tissues and which do not share a similar chromosomal translocation, might however have similar inner workings.
- Sjogren and co-workers evaluated extraskeletal myxoid chondrosarcomas (EMCs) with differing histologies and different chromosomal translocations. Despite these differences, the microarray revealed very similar gene expression and even provided the new insight that CHI3L1, a gene which encodes a protein involved in non-malignant conditions with disruption of extracellular matrix, is highly overexpressed in these tumors. This raises the suspicion that CHI3L1 could be a target for therapy in this disease and will no doubt give rise to subsequent more focused studies to understand the biology of this molecule in this tumor.
- Khanna et al. used microarray to compare gene expression in mouse osteosarcomas which were not highly metastatic vs. those which had a propensity to metastasize. They identified that although many thousands of genes were checked, only a few were differentially expressed between these two tumor which varied in their aggressiveness. Among these, they identified ezrin as a molecule which was overexpressed in the highly metastatic type. They went on to show that lowering ezrin levels diminished the ability to metastasize and also that clinical samples of osteosarcoma which had high levels of ezrin were associated with a poor prognosis. Thus, by sifting through thousands of genes using microarray, these investigators narrowed the field of potential perpetrators of metastases to a few genes, and have gone a long way toward proving that ezrin is one of the critical players in the metastatic process.
In summary, evaluation of gene expression in the complex and heterogeneous group of cancers called sarcomas is an area of vigorous research interest. This type of an approach, which can simultaneously assess many thousands of genes from a very small amount of tumor tissue, appears to be the best way to narrow down the list of possibilities of critical targets in these tumors. The technique is complex and requires careful controls and experienced investigators to be sure that the differences observed are real, and to efficiently sift though the immense amount of data which is generated. Despite these challenges, it is anticipated that such studies carried out during the course of this decade will ultimately allow the sarcoma community to go beyond the largely descriptive categorizations which we currently have, to those defined by a genetic signature which provides insight into the "inner workings" of these cells. It is hoped that the identification of critical genetic signatures will then lead to new therapeutic targets that will ultimately improve the outcome for patients with these diseases.
Additional Microarray Resources
- The National Human Genome Research Institute’s Website
- The Talking Glossary of Genetic Terms (listen to a detailed explanation of microarray technology).
- Cancer Gene Map article in "The Scientist" by Charles Choi
- The The Functional Genomics Data Society
- The Stanford Microarray Database
- Stanford's GeneXpress Website
- MIT's Cancer Program Data Sets