Linking expression data to pathways

Say that you are a soybean geneticist and you just ran a gene expression experiment. Your microarray data is analyzed in R or you use ExploRase to identify differentially expressed (DE) genes in your data.
The DE genes are brought into Excel and then exported as a .csv text-file. You're going to use this file to correlate the genes with pathways in MetNet. Your goal is to find over-represented pathways in the gene-list so that you can learn more about the conditions and cellular functions that your experiment affects.
MetNet allows the creation of "lists of entities", which can actually be either genes or metabolites. So we have your metabolomics experiments covered, too! In order to make use of this functionality, however, you need a personal "My MetNet" account. Don't worry if you don't have one already: you can easily create one and it's free of charge.
With your account information, proceed to log into your "My MetNet" portal.
Click on the "My Lists" link on the "My MetNet" menu bar and you may see lists that you created earlier. Regardless, scroll down to the bottom of the page and you'll see a dialog that let's you create new lists based on an uploaded file (like the one that you prepared earlier).
Select the file, fill out a description for it, and press the "Create list from file" button.
When you choose to see the details of your list, you see the same genes (in this case they're actually probe-IDs) as you had in your Excel spreadsheet. The green checkmark in front of each probename indicates that MetNet was succesful in matching your entities with one of its database-entries. The same goes for metabolites: "water" would receive a green checkmark, but something random like "gtg54t234" would be preceeded by a red cross. It would still be allowed in the list, though: maybe in the future somebody may discover a new gene and name it "gtg54t234". Since we're interested in pathways that these genes may have in common, we click on the "Common pathway" tab.
You can group pathways in three ways. Shown here is the default "by pathway" representation: it lists all pathways that are linked to at least one of the entities in your list, followed by the entities from your list that are actually present in the respective pathway. You can also go the other route, by selecting the "by entity" tab: this would show you each entity in your list, followed by the one or more pathways in which it can be found. The final tab present an enumeration of relevant pathways, without showing any information of which genes were found in them.
   
The problem with the above view is that it is qualitative: it tells you which entities (genes/probes or metabolites) can be found in which pathways, but not how relevant these results are. Aspecially if you have a substantial number of entities in your list, this can be a problem. The fundamental question can be thus formulated: Suppose you have three genes in your list. One gene matches a small pathway with only four genes total, while the other two genes are found in a pathway with twenty genes in it. What pathway tells you more about the function of any one of your genes? It is basically a distinction between a qualitative interpretation of findings (sometimes you may just be happy that an entity is present in a particular pathway) or a quantitative interpretation ("which pathways are the most relevant and related to my experimental conditions?").
 
The Fisher Exact test was created for exactly this purpose, and it has been integrated into the MetNet Online interface. You can access it by selecting "Search" in the main menu-bar, followed by "Common pathway search".
For this function, you don't even need to be logged in with your "My MetNet" account. Simply past your list of entities from Excel or another program, paste it into the textbos on the webpage and click the "Show me the pathways" button.
The pathways are now ordered by p-value. As with all statistical tests, small p-values indicate higher significance. The smaller the p-value is, the more significant the found pathway is and the more influenced (and controllable) it may be by your experimental conditions.
This final diagram illustrates the concepts of this tutorial in a single overview: you can use expression data, or nextgen sequencing data, or metabolomics data, and bring it into MetNet. You can then look at the relationship of your entities with the present pathways in our MetNet database. Depending on your goals, you can anayze and interpret the results either qualitatively or quantitatively.

This concludes our tutorial on use case 2, which was an illustration of you can link expression data with MetNet Online website pathways. You can select another tutorial if you wish.


Copyright Wurtele lab