Fingerprinting Archean gold deposits using geochemistry, part 2: correlations

Introduction

Last week, we explored the idea of different ways of using geochemistry of other elements while exploring for Au.

Geochemistry helps to:

know geochemical fingerprints to use non-gold proxies when we look for anomalies
use elements that associate with Au to identify “near miss” holes in the long section
vector effectively by discriminating between mineralization centered on intrusions vs shear zones

The key to all of that is to understand what elements associate with Au, and what do their relative concentrations and behaviors mean. We came up with a starting list of elements associated with gold last time. Let’s see what we can dig up from some other data.

The periodic table

Periodic table of the elements

Let’s start with the basics. Gold is number 79 on the periodic table. The periodic table is sort of like a map: similar elements are grouped together, in rows (periods) and columns (groups), based on their chemical characteristics. There are several things that vary systematically across the periodic table, and they can help us to develop an intuitive feel for how things fit together.

Groups have the same valence electron configurations, so they tend to act very similarly to each other. Periods share the same number of electron shells, and they vary from the most metallic on the left to least metallic on the right. Elements on the left have higher atomic radius and on the right have lower, which has important connotations for geochemistry because it controls what kind of mineral elements can fit into. Much more information on the “periodic trends” is available here.

Mapping Au’s friends

We know proximity in the periodic table is meaningful to determining what will co-occur with Au. We also know that some elements are just straight up common or rare. What if we started by measuring the distance between Au and all the other elements, then also compiled their relative crustal abundances? That way we could identify candidates for Au’s buds, but also ignore elements that are either hopelessly common or hopelessly rare (sorry, Wilhelm Röntgen, namesake of roentgenium).

Measuring periodic table distance

The periodic table is not a continuous thing. It’s discrete, like a grid. On top of that, horizontal distance means something different than vertical distance. What is the best way to measure it then? It’s the Manhattan or taxicab distance (I’m walking here!).

Manhattan distance (red = blue = yellow) vs Euclidean distance (green)

Manhattan distance is measured like you would with city blocks. North one block is 1, east is 1, etc. etc. and the total distance is their sum. Now in our case we kind of want to weight the east-west distance differently than the north-south distance. That’s easy – we just add a weight to the Manhattan distance formula. Maybe north-south is 2x more important? Let’s try that.

Crustal abundance

I’ve taken upper crustal abundances from Rudnick and Gao.

Let’s look at the elemental abundance data for a second, or try to. Here is a pie chart with the most common trace elements on the outer ring, down to the least common in the middle. Each ring is contained in the “Other” piece of its outer ring. I don’t know – it looked better in my head.

I’m trying to show the relative abundance of the trace elements here, and as you can see it is quite a range (9 orders of magnitude, in fact).

Maybe it’s easier to get a handle on the relative abundances with a sea life analogy. In terms of weight, for every blue whale worth of barium (infamously heavy element though it may be), you get one krill worth of iridium, and a seahorse worth of gold. No wonder it’s so hard to find it!

Visualizing 9 orders of magnitude in weight from the humble krill to the mighty blue whale.

We want to find good pathfinders for Au (seahorse), so we need elements that act like Au, but also ones that are neither too common (whale) or too rare (krill). If the elements are rarer than gold, they aren’t particularly helpful. On the other hand, if they are too common, we won’t really be able to tell the signal from the noise. What we want are elements that act like gold, are more abundant, but aren’t so abundant that it might not make sense to look at them.

Putting it together

Let’s have a look at what we have. I’m going to say that elements within 10 weighted units of Au on the periodic table are “close enough”.

Element	Weighted distance	Relative abundance vs Au
Cu	2	18666.7
Cd	3	60.0
Ni	4	31333.3
Tl	4	600.0
Zn	4	44666.7
In	5	37.3
Co	6	11533.3
Ga	6	11666.7
Pb	6	11333.3
Sn	7	1400.0
Bi	8	106.7
Ge	8	933.3
Sb	9	266.7
As	10	3200.0
W	10	1266.7

I think we are going to find that elements like Cu or Co may very well be correlated with Au, but because they are so much more common they will swamp the signal. Therefore they might not make a great pathfinder or fingerprint element. So we will cut it down to just: Cd, Tl, In, Sn, Bi, Ge, Sb, As, and W.

Real data

Let’s see what actually correlates with Au. We’re going to use the SIGEOM rock geochemistry dataset for Quebec and do some simple analysis to determine elements that co-occur with Au.

First I’ll import the data.

I want to filter down samples to just those that occur near a gold showing.

Next I do some basic cleaning up to deal with weird column names. In this dataset, below detection limit is sometimes denoted with a negative number, other times it uses the detection limit or half of it. There are quite a few 0’s as well.

Another issue is there are many repeats in some columns (they are generally below detection, or something else more problematic). In any case, they aren’t informative, so I want to cut them here before they cause problems later on.

We will use PCA to find what elements correlate with Au in this dataset. PCA doesn’t work with missing data, so we will need to drop rows that are missing data and columns that have frequently missing data. This is a classic problem with regional geochemistry data. What I’ve found works is I iteratively drop columns and measure the total number rows retained, along with their product. I choose the “optimal” number of columns based on where that number really drops off and it’s diminishing returns for retaining more rows. Depending on what I’m doing, I might change how that goes either by insisting on keeping certain columns, always removing certain columns, or weighting more rows over columns to preserve spatial coverage. In this case we have a large dataset so I will be greedy. We are left with the following columns: Au, Cu, Ba, Sr, Zn, Ni, Ag, Zr, Y, Co, Cr, Pb, Mo, V, As, W, Sb, Sc, Nb, Be, La, Bi, U, Tl, Ta, Cs

A respectable list. And we can check the spatial coverage:

Not bad either, let’s move on.

Transforming the data

We need to do some work on the data before we can dig in with PCA. We will do two things:

center log ratio transform (CLR) to deal with the closure problem
scale the data

The closure problem and CLR

OK closure people, you can put away your knives. For those not in the know, there is an issue with directly applying PCA to geochemical data because geochemical data is compositional. That means that the numbers represent relative proportions (parts per million), not an absolute value (total parts). That affects PCA because there are necessarily correlations where increasing one part must decrease another, but also because the assumption of normality is very not valid. And as a result you can end up with strong, non-meaningful associations.

Now I wouldn’t be worth my salt if I couldn’t offer you some more information about this and one of my trademark opinions. To be honest, I’ve come full circle on this one. Initially, I dismissed the problem because it seemed like a very small, vocal group of people were the only ones who ever talked about this issue and the evidence presented was too boring and complicated to really look into. And besides, I was scaling the data before further analysis anyhow. I wouldn’t do something, or advise you to do something, if I didn’t think it did anything all that important. Well, I’ve experimented with this on my own and trust me, it has a major effect, at least on the loadings. So you should do it.

The centered log ratio is the transformation of choice for opening the data, and it looks like this:

g(x) means the geometric mean of x, and x1…xD are the individual values of variable x. But it’s implemented in the scikit-bio and pyrolite packages, so it’s easy to add in to your pipeline.

Scaling the data

The other thing I want to do is scale the data. Arguably this isn’t all that needed after the CLR transform, but I think it makes sense to do. Scaling the data makes it so that all elements range from -1 to 1. How exactly that scaling is done is complicated because it can be strongly affected by outliers. Unscaled data is going to mess up PCA, because it is based on variance. Values that are just higher period (looking at you, phosphorus) will have a higher variance and so will tend to dominate the loadings. I remember years ago seeing a La/P ratio touted for gold exploration, which was 100% due to not scaling data and totally meaningless.

PCA

OK time for the fun part. We will fit a PCA model to the data and look at some of the results.

This shows the explained variance ratio from each component. Quantitatively, this plot can help you pick how many PC’s to use, if you want to reduce dimensionality. Normally the elbow (around PC 5, here) is used but it’s up to the modeler.

Qualitatively, I like to see a sharp but not too sharp bend in this bad boy. Just thinking about the geology of it, usually we want to see something about lithology for PC1 (often felsic vs mafic), then something about mineralization for PC2, then something about different kinds of felsic rocks or something for PC3. It’s not a hard and fast rule by any stretch and you need to consider the dataset’s who, what, when, why, where, and how. The transform is the transform, it will always work because it’s just a mathematical construction. What weaker PC’s mean is that interpretation of the loadings (which again, is not strictly necessary for its use but is often of interest in geology) is weaker.

Loadings

Here are the loadings for PC2 (the most strongly loaded by Au).

And then just the positive loadings:

For now, this is what we are after. We don’t really want to do anything with the transformed data yet. The purpose here was to answer the question of what lines up with Au in a real dataset. So what we are seeing here is that Au is positively correlated with Be, Cu, U, Pb, Mo, Ag, Sb, W, Tl, Bi, As, and Cs.

Comparing a priori set to real set

Let’s see how that lines up with our estimate from earlier.

Element	A priori	Empirical
Ag		✅
As	✅	✅
Bi	✅	✅
Cd	✅
Co	✅
Cs		✅
Cu	✅	✅
Ga	✅
Ge	✅
In	✅
Mo		✅
Ni	✅
Pb	✅	✅
Sb	✅	✅
Sn	✅
Tl	✅	✅
U		✅
W	✅	✅
Zn	✅

Makes sense! I had removed Ag before because I put it in the “precious” category, not trace. The elements that don’t come through in the empirical data could be due to a lack of correlation, issues with the coverage of the data (few samples included an element), or with the data values themselves (many repetitive values).

Summary and next steps

We’ve come up with the list (Be, Cu, U, Pb, Mo, Ag, Sb, W, Tl, Bi, As, and Cs) to use to act as Au pathfinders and fingerprinters. Next time, we will apply that list to see if it can help discriminate types of deposits or maybe find new ones.

Shaun

References

Rudnick, R.L. and Gao, S. 2014. 4.1 – Composition of the continental crust. In Treatise on Geochemistry, eds. Holland, H.D. and Turekian, K.K. Elsevier, 2014, p. 1-51. https://doi.org/10.1016/B978-0-08-095975-7.00301-6

Images

Periodic table 2012rc, CC BY 3.0, via Wikimedia Commons

Blue whale Encyclopædia Britannica, Inc.

Krill Øystein Paulsen, CC BY-SA 3.0 via Wikimedia Commons;

Seahorse Emőke Dénes, CC BY-SA 4.0, via Wikimedia Commons

Element	A priori	Empirical
Ag		✅
As	✅	✅
Bi	✅	✅
Cd	✅
Co	✅
Cs		✅
Cu	✅	✅
Ga	✅
Ge	✅
In	✅
Mo		✅
Ni	✅
Pb	✅	✅
Sb	✅	✅
Sn	✅
Tl	✅	✅
U		✅
W	✅	✅
Zn	✅

Element	A priori	Empirical
Ag		✅
As	✅	✅
Bi	✅	✅
Cd	✅
Co	✅
Cs		✅
Cu	✅	✅
Ga	✅
Ge	✅
In	✅
Mo		✅
Ni	✅
Pb	✅	✅
Sb	✅	✅
Sn	✅
Tl	✅	✅
U		✅
W	✅	✅
Zn	✅

Element	A priori	Empirical
Ag		✅
As	✅	✅
Bi	✅	✅
Cd	✅
Co	✅
Cs		✅
Cu	✅	✅
Ga	✅
Ge	✅
In	✅
Mo		✅
Ni	✅
Pb	✅	✅
Sb	✅	✅
Sn	✅
Tl	✅	✅
U		✅
W	✅	✅
Zn	✅