Introduction
We’ve been talking about fingerprinting for the last two weeks (part 1 here, part 2 here), meaning taking some geochemistry data and using it to infer something about genetics of a deposit. Last time, we looked at what some of the correlations we can expect to find. In other words, the elements that might define those fingerprints. Today, we are going to see if there are types or trends that we can see in the real dataset.
Types, trends, classes
What is a type in this context? It refers to a set of deposits that share some geochemical (and hopefully other, more on that later) characteristics. So if we say that there are deposits that are vanilla shear zones, and there are ones where a causative intrusion is very closely linked, then we might expect deposits within those classes to share some common characteristics.
In other words, we can plot the geochemistry data up and perhaps see some blobs in the data that match up to what we think about the geology. Just like with plain correlations, it’s difficult to make sense of this kind of thing in the regular high dimension space represented by multielement data. This time we will use PCA for its dimensionality reduction properties as well, not just as an exploratory tool. That means we will take the big wide geochemical data and look at it using just the first few components, with the idea that they have explained a lot of the variance of the data. At least, if we think there are natural classes present, they should be revealed in the 3D PCA space.
The plan
– import deposits and identify examples
– import samples and do qaqc
– decluster deposits
– aggregate sample data by deposit
– transform data
– look at the data
The data
See last time for more information about the input data, but it’s part of the SIGEOM rock sample database for Quebec.
For this part, we really want to only look at samples that could be considered anomalous for Au. That’s because we are specifically trying to understand mineralization. We’re going to end up with a lot of “lithology” information in the data no matter what we do, so our fingerprints will look a lot like lithology. Ideally we would be looking at just mineralization sampled directly, but that’s not available. So in the interest of moving forward, we will look at only (even very weakly) anomalous samples for Au.
This is the raw histogram of Au values in the database (as ppm).

As you can see, there are several highly repeated values that represent “below detection limits” for different detection limits. We will make sure to remove at least these, because while there might be some actually anomalous (ie, more than very close to 0) samples in there, there is mostly junk. We can use 20 ppb as the cutoff here to capture “at least a bit more than nothing” level samples only.
The trimmed histogram looks like this:

Better.
Type deposit data
I’m going to use three deposits to represent some possible classes: Malartic, LaRonde, and Sigma. I’m not going to get into why these ones, mostly it’s because I could name those three as kind of different looking deposits: intrusion-related, Au-rich VHMS, and good ol’ shear zone hosted orogenic gold respectively.
To investigate the idea that perhaps the idealized types of deposits represented by these examples can be detected in the data, I’m first going to look at how concentration of the elements varies between them.

This plot shows the relative concentration of the set of elements considered in this analysis for each example deposit. Note the log scale on the y-axis. There are several elements with orders of magnitude variation.
This is just the trace element data:

Same story. We can expect that these signals (if they correspond to real classes) could be present in the province-wide data.
Generating input data
For this step, we will take Au deposits across Quebec and decluster them to combine examples within around 2 km of each other as single deposits. If we don’t do this, we will end up splitting geochem between neighboring deposits and dropping more deposits than we want before we can process them. The map below shows the raw deposits, the clustered results, and the producers for Quebec (past and present) are buried under them somewhere.

Next we join samples to their nearest cluster and then aggregated the geochemistry data for each deposit cluster, so we end up with a local average of the geochemistry samples for each example. I’m using the mean here, although there is a case to be made for other strategies. So now we have a table that looks like this:

Once gain we have to drop any aggregations that have missing values for the input columns. The coverage survives pretty well, although it is definitely biased towards the northern part of Quebec.

Transforming the data
We are going to use a pipeline that looks like this:

We will center log ratio (CLR) transform, then scale the data, and finally do PCA.
The results are shown below. First here is the explained variance vs PC number:

The elbow is around PC6, meaning we’ve explained a lot of variance by there while still having lower dimensions. The idea this time isn’t to worry about what’s in the PC’s loading-wise, but to use them to reduce the dimensionality of the data. That being said, we can have a look at the loadings.



Lots to take in here, if you’re interested. Predictably, we are getting a lot of lithology information, which is fine. Of course we will end up clustering or fingerprinting the surrounding rocks rather than the mineralization but we get what we get. You’ll note that I’ve removed Au and Ag from consideration.
Now let’s look at what some 2D plots of PC1, 2 and 3 spaces look like.

And here’s PC3 vs PC2.

The data points form a single large cluster with outliers and trends along PC1 and PC2. Now I’ve included some example deposits in here as I mentioned before. I want this to remain unsupervised, but I’m interested in seeing how these type examples relate to the clusters identified. The examples are “out there” a bit, rather than falling squarely within the main blob.
Clustering the transformed data
We will take the transformed data and identify three main clusters in PC1-6 space. The idea here is to compare the derived clusters to our “type” deposits, and see if they look meaningful.
Now KMeans is always going to give us clusters just like we tell it to. In this case, it looks like the clusters will be at least somewhat arbitrary. That is to say, the data fall along some continua rather than clearly defined and separate blobs. Let’s see what it looks like, once again in PC1-PC2 and PC2-PC3 spaces.


It’s cool. The clusters are pretty arbitrary, but they do kind of capture the examples in the 3 dimensions. Once again, not trying to necessarily do that, but just want to gauge how three rather different looking deposits look in the transformed space.
Here it is in 3D.

And we can also look at these clusters spatially, to make sure we aren’t just getting clumps that correspond to say the different data style in the northern part of Quebec vs the southern part of the north (that’s one for canada.txt if I’ve ever heard it, but you know what I mean).

Outcomes
So the answer is that, for this data set, it might be useful to consider different classes of deposits. We’ve identified blobs in the geochemistry data, but we haven’t really made the link to geology yet.
To do that, we can look at the loadings, but that’s just high level. Comparing to “type” deposits is interesting, but it really leans into the idea that there really are “natural types” when in fact we simply want to improve our chances of success rather than establish some taxonomy of deposits. In other words, how do we actually use the interesting transformations we’ve explored here.
There’s two ways.
First, we can aggregate the conceptual description of the deposits for each cluster, the interpretations associated with them, which will help reveal overall patterns of conceptual models. Then we know what we should be using to build out our targeting model at a new deposits, based on its geochemical analogies.
But we can also consider the shape. That’s ultimately a big part of what we want to do here, perhaps of the most immediate utility. We can collect and crunch all the data there is, but ultimately we need to know where to drill next, which means we need to estimate what our hypothetical deposit is shaped like. So the real key here, all this geochem stuff being cool, is “can we estimate the shape of a mineral deposit based on geochemistry data”. In other words, how can these geochemical fingerprints actually help with the nuts and bolts of small-scale exploration. Big question, but one that is within our power to answer.
Next steps
Given an anomalous sample or drillhole, we determine which of our three clusters it belongs to. Then we have two questions:
1) how should be target the next hole? In other words, what is the expected mineralization shaped like?
2) what should we be weaving into our exploration model? what do we care about, based on analogy with other deposits?
We can aggregated those data by each cluster, and finally what we will have is a map, a data-to-target framework that looks like:
measure geochemical data -> assign cluster -> look up other deposits in cluster -> determine important targeting criteria, estimate shape of mineralization
Thanks for reading.

