## INTRODUCTION

Ever since the first determinations of the age of archeological objects by radiocarbon (^{14}C) in 1949 (Libby et al. Reference Libby, Anderson and Arnold1949; Arnold and Libby Reference Arnold and Libby1949), researchers have striven for easier and more accurate measurements. One of the first breakthroughs was the gas proportional counting technique using CO_{2}, developed and applied by de Vries and Barendsen (Reference De Vries and Barendsen1952, Reference De Vries and Barendsen1954). In the 1980s, accelerator mass spectrometry (AMS) revolutionized the field: faster measurements and the capability to use orders of magnitude less sample made numerous new applications of ^{14}C possible (for example Damon et al. Reference Damon, Donahue, Gore, Hatheway, Jull and Linick1989; Linick et al. Reference Linick, Damon, Donahue and Jull1989; Cook et al. Reference Cook, Wadsworth, Southon and van der Merwe2003; Meijer et al. Reference Meijer, Pertuisot and van der Plicht2006; Rasmussen et al. Reference Rasmussen, van der Plicht, Doudna, Nielsen, Hojrup, Stenby and Pedersen2009; Dee et al. Reference Dee, Wengrow, Shortland, Stevenson, Brock, Girdland Flink and Bronk Ramsey2013). Since then, innovations to the AMS technique have led to incremental but important improvements, mostly concerning the precision and accuracy achievable.

Such innovations are illustrated by the changes in equipment used for ^{14}C measurement at the Centre for Isotope Research (CIO) of the University of Groningen. Proportional counters were in operation from the early 1950s until 2011. From 1994 until 2017 the CIO operated a 3 MV ^{14}C-dedicated accelerator mass spectrometer “tandetron” (AMS) (High Voltage Engineering Europa, the Netherlands) (Gottdang et al. Reference Gottdang, Mous and van der Plicht1995). Since September 2017, a MICADAS (MIni CArbon DAting System, Ionplus, Switzerland) has been in operation (Synal et al. Reference Synal, Stocker and Suter2007; Wacker et al. Reference Wacker, Bonani, Friedrich, Hajdas, Kromer and Němec2010). The MICADAS outperforms the previous AMS in several respects, the most important being increased efficiency of the source (typically 5% compared to ≈1%, resulting in more counts) and besides the measuring on graphite, the additional ability to measure CO_{2} gas directly.

With the former generation of accelerator mass spectrometers, the number of individual counts acquired from a sample, nearly always determined the ^{14}C measurement uncertainty limitation (a product of the underlying Poisson statistics) and in fact also the final reported uncertainty. All other contributions to the ^{14}C measurement uncertainty (such as the background variability and the ^{13}C correction), were much smaller, which meant in practice that they were both negligible and impossible to determine. However, with the new MICADAS machine this is no longer the case, thanks to the much higher count rates from the source as a result of its increased efficiency. As the Poisson uncertainties have now decreased dramatically, other variables in the measurement contribute to the ^{14}C measurement uncertainty, and still this measurement uncertainty is much lower than before. In addition, contributions to the final reported uncertainty of the various preprocessing and processing steps are no longer negligible either. For a reliable and confident estimate of this final uncertainty, or with a better name the expanded uncertainty (as defined in JGCM 100 2008), we have to take all these contributions into account.

In this paper, we systematically evaluate those other contributions, such that in the end we obtain a reliable expanded uncertainty. With a total of approximately 7000 ^{14}C measurements in its first one and a half year of operation, we have gathered abundant information for this detailed uncertainty analysis.

We are, obviously, not the first to publish a manuscript discussing uncertainty analysis of ^{14}C measurements. Stuiver and Polach (Reference Stuiver and Polach1977) did address the issue in their much-cited paper, and (Hedges et al. Reference Hedges, Law, Bronk and Housley1989) describe which sources contribute to the uncertainty. Scott et al. (Reference Scott, Cook and Naysmith2007) give a nice general overview of uncertainty calculation in radiocarbon dating. None of these papers, however, treat the subject in all the details that we show and explain in the present work.

In the coming sections, we demonstrate quantification of uncertainty sources of the various steps in the process from sample to ^{14}C measurement as well as of the actual ^{14}C measurements. To determine and quantify each uncertainty contribution in this whole process, the long-term performance of our secondary references is key. However, as these materials are all pure, homogeneous substances, uncertainty based on their analysis alone might be systematically too small. Therefore, in addition we monitored known-age samples and unknown sample duplicates in various phases of the preparation process. In this way, we could establish whether homogeneity, complicated combustion conditions, or contaminations from the environment (soil-derived compounds, CO_{2} from laboratory air, contamination through memory effect during combustion) play a significant role.

## EXPERIMENTAL SETUP AND METHOD

The preparation of samples for ^{14}C analysis is dependent on the type of material. Archaeological samples (wood, bone, charcoal, seeds) usually need chemical pretreatment, followed by combustion, whereas CO_{2} in air and in water only needs to be extracted in various ways (for example cryogenically or acidification of alkaline solutions). Still other samples, like carbonates, are treated with acid to convert them to CO_{2}. The CO_{2} produced, or isolated, is subsequently graphitized (reduced to elemental carbon) and pressed into AMS sample holders (usually called cathodes), with which the actual measurement can be performed.

During all the steps from sample preparation to measurement, utmost care and attention are required to keep the contamination accumulated to a minimum, as the ^{14}C variability will increase along with the amount of contamination. The accumulated contamination contributes to a greater expanded uncertainty.

In this paper, we restrict ourselves to samples with a regular graphite mass (2 mgC: mg of carbon). Gas measurements and small-sized (<0.6 mgC) graphite samples will be dealt with in a forthcoming publication. In the next section, we will briefly describe all the relevant preparation steps, with emphasis on the aspects that influence the ^{14}C signal in the samples, and thus contribute to the final uncertainty.

### Processes from Sample to ^{14}C Measurement and Possible Contamination Sources

#### Sample

A ^{14}C measurement is performed on a small portion of the original sample. Inhomogeneity in the sample material is an issue in providing reliable dates and it is an important source in the final reported uncertainty, and at the same time hard to quantify (only with duplicate or even multiple sampling, as we will show).

#### Chemical Pretreatment

Sample-specific chemical pretreatment, in other words the best way to collect an isotopically reliable carbon fraction from each type of material, has been and continues to be the subject of many publications, discussions, and round-robin tests between laboratories. The routine chemical pretreatments used in our laboratory were originally summarized by Mook and Streurman (Reference Mook and Streurman1983) and more recently by Dee et al. (Reference Dee, Palstra, Aerts-Bijma, Bleeker, Bruijn de, Ghebru, Jansen, Kuitems, Paul, Richie, Spriensma, Scifo, Zonneveld van, Verstappen-Dumoulin, Wietzes-Land and Meijer2020).

Chemical pretreatment can affect the final isotopic composition of a sample and hence contribute to the expanded uncertainty in several ways, such as through the introduction of contamination with “foreign” carbon during sample handling, or the incomplete removal of contamination that was originally present in the sample.

#### CO_{2} Production

Different types of material need different techniques to produce CO_{2}. The techniques currently being used at the CIO are described in Dee et al. (Reference Dee, Palstra, Aerts-Bijma, Bleeker, Bruijn de, Ghebru, Jansen, Kuitems, Paul, Richie, Spriensma, Scifo, Zonneveld van, Verstappen-Dumoulin, Wietzes-Land and Meijer2020). Combustion is performed with solid samples, like bones, seeds, charcoal, and wood, to produce CO_{2}. Carbonates are converted to CO_{2} in an in-house glasswork manifold. It is described fully in Meijer (Reference Meijer2009). The method of extracting CO_{2} and measuring ^{14}C in cremated bones samples was first reported in Lanting et al. (Reference Lanting, Aerts-Bijma and van der Plicht2001). CO_{2} in air is cryogenically extracted or chemically captured in a sodium hydroxide solution.

Possible contributions from the CO_{2} production to the expanded uncertainty are incomplete combustion, memory from one sample to the next (either in the combustion oven, or in the CO_{2} trap), contamination from the reagents (oven chemicals, helium, oxygen, and sodium hydroxide) and contamination during sample handling, for example not using ultraclean equipment and instruments. In addition, the exchange of CO_{2} with CO_{2} from a previous sample at the glass surface (wall absorption/desorption) and leakages can also cause contamination. The risk of contamination in various steps will lead to added uncertainty in the final result.

#### Graphitization Systems

The CO_{2} samples from all different sources will have to be reduced to elemental carbon (commonly referred to as graphitized) for ^{14}C measurements with higher precision. Our graphitization set-up is described in Aerts-Bijma et al. (Reference Aerts-Bijma, Meijer and van der Plicht1997); De Rooij et al. (Reference De Rooij, van der Plicht and Meijer2010); and Dee et al. (Reference Dee, Palstra, Aerts-Bijma, Bleeker, Bruijn de, Ghebru, Jansen, Kuitems, Paul, Richie, Spriensma, Scifo, Zonneveld van, Verstappen-Dumoulin, Wietzes-Land and Meijer2020).

Possible contributions from the graphitization process to the expanded standard uncertainty are exchange with previous CO_{2} at the glass surface and the presence of contamination in the glasswork manifold or in the iron powder. We store the elemental carbon samples (graphite) produced in the reaction tubes in which they were formed, at room temperature and with Argon added up to atmospheric pressure. These samples are pressed just before measurement, into aluminum sample holders with an automated home-built press at approximately 1 MPa (Aerts-Bijma et al. Reference Aerts-Bijma, Meijer and van der Plicht1997).

During the graphitization process sample-to-sample contamination can also occur if the reactors are not cleaned sufficiently. Furthermore, experience has taught us that the pressed graphite-iron mixture is susceptible to carbon uptake from air, and even while at vacuum in the ionization chamber (Paul et al. Reference Paul, Been, Aerts-Bijma and Meijer2016).

####
^{14}C Measurements

We performed all the relevant ^{14}C measurements on our MICADAS. In our routine operation, a regular batch consists of five Oxalic Acid II references, necessary for the calibration of the batch (the tuning of the machine also requires one additional Oxalic Acid II reference), four sample-specific background references (^{14}C-free material, resembling the sample materials as much as possible), two secondary references and 28 samples (unknowns). Approximately forty minutes of measurement time per sample yields typically 750,000 ^{14}C counts for the Oxalic Acid II calibration material.

### Input Data for the Uncertainty Analysis

As part of our general quality control and assurance procedures, (secondary) reference materials and background materials are processed alongside the samples, and these materials are selected to resemble the various sample types as closely as possible. The results for these various materials are of course a valuable source of information for our uncertainty assessment. In addition, sample duplicates are also frequently analyzed.

#### Samples Analyzed as Duplicates

In our routine operation, we regularly prepare sample duplicates. The sample is divided into two portions and then the chemical pretreatment, the CO_{2} extraction, the graphitization and the ^{14}C measurement are performed on different days, as if they were two different unknowns. Thus, everything in the whole process is performed as independently as possible. These full duplicates yield a wealth of information about the expanded uncertainty. As they are pretreated and measured in different batches, their uncertainties are to a large extent independent.

To discriminate these “full” duplicates from other partial duplicates (see below) we call them pretreatment duplicates. In order to enhance the readability of this paper we divided the different kinds of duplicate also into categories from 1 to 4. These pretreatment duplicates are from now on called category (cat.) 4 duplicates, as all four steps from chemical pretreatment, CO_{2} preparation, graphitization and ^{14}C measurement are different for these duplicates.

A special cat. 4 duplicate is the VIRI F Horse bone. In some series of bones, which have to be pretreated, leftover material from the VIRI intercomparison (Scott et al. Reference Scott, Cook and Naysmith2010), the VIRI F horse bone, is pretreated as well. This sample is considered a known-age cat. 4 duplicate, because many ^{14}C laboratories have dated this material.

CO_{2} preparation duplicates are samples divided into different portions after the chemical pretreatment, which are then separately handled further, so these duplicates share the chemical pretreatment, but not the following three steps (CO_{2} preparation, graphitization, and ^{14}C measurement). These duplicates are thus cat. 3 duplicates.

For the graphitization duplicates, CO_{2} from one CO_{2} preparation process is divided into two or three portions, and each portion is separately graphitized, making them cat. 2 duplicates.

A cat. 1 ^{14}C measurement duplicate means that the graphite formed is split into two portions (so all parts of the process before are common). Such duplicates rarely occur.

All duplicates contribute to a better understanding of the contributions of the different preparation steps to the expanded uncertainty. The different duplicates are further clarified in Figure 1.

#### Background Materials

Several sample-specific ^{14}C-free (“dead”) materials are available in our laboratory for identification of the “background”; that is, the modern carbon contamination, accumulated during the process from chemical pretreatment until ^{14}C measurement. Background wood (“bgw”, Kitzbuhel I, Tirol, Austria) and background collagen (“bgc”, Latton Quary LQH 12) (Cook et al. Reference Cook, Higham, Naysmith, Brock, Freeman and Bayliss2012) have been selected for measuring the combustion background. Both materials are known to be far older than the detection limit of ^{14}C. The background material for carbonates (shells and cremated calcined bones) is named GS-35 (grained marble from a stonemasonry in Groningen). The graphitization background gas is Rommenhöller CO_{2}, a fossil gas of geological origin (Linde gases).

#### Secondary References

Besides the sample duplicates and background materials, several secondary references are combusted and graphitized together with the samples and treated as “samples of known ^{14}C content”, to monitor the total process. As the secondary references are all pure substances, they form the basis for determining and quantifying each uncertainty contribution in the whole process. These secondary references are IAEA-C8, IAEA-C7 (oxalic acid, Le Clercq et al. Reference Le Clercq, van der Plicht and Groning1998), and GS-51 (Groningen Standard, cane sugar). For these references, extensive measurement records over more than 20 years have been compiled. This present study is, however, based on data measured with the MICADAS only.

These secondary reference materials are treated in two separate ways. The first involves them being combusted in large quantities, yielding five to ten liters of pure CO_{2} collected in small cylinders (called “bulk”). Thanks to the large quantities of gas, we can use them to prepare many samples over the years, which then do not contain combustion-induced variability (in analogy to the different types of sample duplicates, we can call them cat. 2 secondary references). Therefore, these cylinder gas analyses can monitor the variability induced by the graphitization process and the subsequent measurement alone. However, we also use these secondary reference materials as individual samples, where they are combusted in small amounts (2 mgC) and serve as combustion references (cat. 3 secondary references).

## CALCULATION OF OUR BEST ESTIMATE FOR THE ^{14}C MEASUREMENT UNCERTAINTY

The ^{14}C content of a sample is expressed as Fraction Modern (Reimer et al. Reference Reimer, Brown and Reimer2004). Because the original batch of the calibration material, Oxalic Acid I is exhausted, Oxalic Acid II is the international calibration reference (cal) with assigned values for **F**
^{14}C_{n}
$ \equiv $
134.066% and δ^{13}C_{VPDB}
$\equiv $
–17.8‰.

Every measured sample is calibrated using:

The subscript sample, bg and cal refers to sample, background and Oxalic Acid II respectively;
$\delta {}^{13}{C_{sample}}$
is the value measured by the MICADAS; its
$\delta {}^{13}C$
scale is calibrated using the assigned Oxalic Acid II value of δ^{13}C_{VPDB} = –17.8‰.

The uncertainty in a ^{14}C measurement is then derived from the partial derivatives of **F**
^{14}C_{n} with respect to each of the variables and is called **dF**
^{14}C_{n}.

Every measured quantity in Eq. (1) has its own uncertainty. The uncertainty in the (^{14}C/^{12}C)_{sample} is the statistical uncertainty (Poisson counting statistics). The uncertainty in (^{14}C/^{12}C)_{cal,} is the uncertainty in the mean of the calibration reference (Oxalic Acid II). The uncertainty in the mean value for the calibration material, and not the standard deviation, is the right choice, as this uncertainty in the mean is relevant for the accuracy of the calibrated scale. Instead of dealing with this uncertainty on a per batch basis, we use the average of the uncertainty in the mean over a considerable number of batches (over the preceding 4 months, typically, under normal circumstances, 50 batches) to avoid the statistical fluctuations in our estimate of the ^{14}C measurement uncertainty. In the first phase of operation, we did not do that, which led to an underestimation of this uncertainty (see Appendix 1). The relevant uncertainty in the next variable, (^{14}C/^{12}C)_{bg}, is the spread (standard deviation) in the background, also over the preceding 4 months. This spread is relevant for the variability in the individual backgrounds and thus also for the samples. The 4-monthly values are closely monitored for sudden or gradual changes on a monthly basis. The uncertainty in the variable δ^{13}C_{sample} (the δ^{13}C from the sample measured by AMS) is the uncertainty derived from the raw measurements. The standard error of the mean of the independent raw measurements (in normal routine, 8 independent measurements) for each graphite sample, is calculated and serves as the uncertainty in δ^{13}C_{sample}. For the last variable, δ^{13}C_{cal}, we again need the uncertainty in the mean, and the typical value is ± 0.1‰ (which makes this uncertainty source negligible in practice).

The quadratic sum of the above-mentioned components times their partial derivatives results in the ^{14}C measurement uncertainty (**dF**
^{14}C_{n}
**)**.

This calculation, using the partial derivatives-approach, is a classical, linearized approximation of the real value of the ^{14}C measurement uncertainty. Correlations between the uncertainties in the different variables are ignored. We compared the outcome of this calculation to a Monte Carlo approach using the NIST Uncertainty Machine (NIST 2019). This is a web-based application for evaluating the measurement uncertainty associated with an output quantity defined by a measurement model of the form y = f(x_{0},…,x_{n}) (Lafarge and Possolo Reference Lafarge and Possolo2015). The Uncertainty Machine provides a numerically calculated probabilistic estimate of the uncertainty. The differences between the linearized approximation via the partial derivatives and the calculation via Monte Carlo turned out to be negligible (this is basically caused by the small size of the uncertainties relative to the values, making the linear approximation a very good one). Therefore, we preferred the ease of using the analytical method of the partial derivatives.

Researchers often report uncertainties in their results that do not contain all relevant sources of uncertainty. Therefore, their uncertainty estimates are usually too low. This is mostly caused by the fact that some of the uncertainty sources are very hard to estimate properly. This approach holds also for the analysis software from the MICADAS for data reduction, called BATS (Wacker et al. Reference Wacker, Christl and Synal2010a). This data analysis package is provided with the MICADAS, and it is a very powerful and versatile tool, so most groups operating a MICADAS use this package, including us. The ^{14}C measurement uncertainty the BATS package provides is based on the Poisson statistics, the so-called molecular correction (^{13}C^{+} resulting from broken-up molecules), and the scatter of the blank samples. As the true ^{14}C measurement uncertainty is underestimated by this combination (and the programmers realize that, of course), in BATS an additional, arbitrary size error can be added by the user. This approach is in line with the “dark uncertainty” philosophy (see below). However, we prefer to explicitly account for all the uncertainty contributions as explained above, such that we produce the most reliable estimate for the expanded uncertainty in our ^{14}C measurement results.

For our approach, it is of course essential to be able to check if the ^{14}C measurement uncertainty (**dF**
^{14}C_{n}) that we calculate is indeed a good measure of that uncertainty. Therefore, we monitor the relationship between those calculated uncertainties and the realized uncertainties, where the latter are determined from the spread in (long) time series of various reference materials. Ideally, their ratio should be around 1.

This approach dates back to Birge (Reference Birge1932). The ^{14}C measurement uncertainties, as we calculate them along with the measurands, are called “internal errors”, whereas the uncertainties observable from the spread in measurands, are called “external errors”. The internal errors are then the expectation; the external errors are the realization of the uncertainty (Birge Reference Birge1932) calls these the “prediction” and “answer to the prediction”, respectively). Their ratio is the reduced
$\chi _{red}^2$
(“chi-squared”). If the predicted internal error is correct, the value of
$\chi _{red}^2$
will be 1 within a certain statistical variability. However, if certain sources of uncertainty have not been accounted for in the internal error,
$\chi _{red}^2$
will be larger than 1. This is often the case in interlaboratory intercomparisons. The apparent extra source of uncertainty is called “dark uncertainty,” and there are different approaches for its calculation. The option in BATS to add extra uncertainty is in fact a possibility to account for this dark uncertainty. The “error multiplier” that has been used in the radiocarbon world can be interpreted along the same lines (Scott et al. Reference Scott, Cook and Naysmith2007).

Birge’s (Reference Birge1932) original work has been taken up and extended by statisticians since then, for recent developments see for example (Rukhin Reference Rukhin2009, Koepke et al. Reference Koepke, Lafarge, Possolo and Toman2017, Merkatas et al. Reference Merkatas, Toman, Possolo and Schlamminger2019). For the sake of completeness, we give the expressions for the necessary quantities (weighted means, internal and external errors) in the Appendix 2.

In our attempt to account for all sources of uncertainty, we strive for the absence of “dark uncertainty”. Nevertheless, it is very possible, even likely, that such uncertainty still exists, as we cannot account quantitatively for variability in the chemical preparation and combustion, even though some of this variability is contained in the standard deviation of the background material, and in the calibration error.

We have two sources available with which we can check the completeness of our uncertainties. The secondary references provide long records, the spreads of which deliver the external errors. Sample duplicates on the other hand provide only two independent ^{14}C measurements, **F**
^{14}C_{n}
^{1} and **F**
^{14}C_{n}
^{2}, each with their individual ^{14}C measurement uncertainty **dF**
^{14}C_{n}
^{1} and **dF**
^{14}C_{n}
^{2}. The quadratic sum of the individual measurement uncertainties gives the uncertainty **dF**
^{14}C_{n(duplicates)}, the difference between those two measurements gives
${\Delta _{duplicates}}$
.

The ratio between
${\Delta _{duplicates}}$
and **dF**
^{14}C_{n(duplicates)}

is a value that should scale according to Gaussian expectations: for a large number of duplicates the average value of ƒ_{σ} should be ≈0, and the σ(ƒ_{σ}) should be ≈ 1, and thus in 68% of the cases, the value should be between –1 and 1. If the standard deviation of this distribution of ƒ_{σ}, is in general too large, this would imply that the calculated ^{14}C measurement uncertainties are too low, and some “dark uncertainty” is present. This ƒ_{σ}, is calculated for all duplicates, and the spread of this distribution, σ(ƒ_{σ}) is a measure for the expanded uncertainty for each of the various types of duplicates.

## RESULTS FOR OUR UNCERTAINTIES

For our previous HV AMS, the calculated uncertainty according to the propagation of uncertainties from Eq. (1) has proven to be an adequate estimate for the expanded uncertainty from all different sources in the total process from chemical pretreatment until measurement. This was no surprise, however, as all contributions that could not be accounted for (chemical pretreatment variability) were overshadowed by the Poisson statistics contribution to the calculated ^{14}C measurement uncertainty.

For the MICADAS, however, this Poisson contribution is much smaller than for the HV AMS, due to the increased overall efficiency. When this pure measurement uncertainty decreases, further investigation of the other contributions to the expanded uncertainty is possible, and in fact necessary.

The expanded uncertainty in the final result is composed of four major contributions. These contributions, consecutively from latest to earliest in the sample handling process, are as follows: the contribution from the actual ^{14}C measurement (Eq. 1), from the graphitization, from the CO_{2} preparation and from the chemical pretreatment. The first contribution, the ^{14}C measurement uncertainty (**dF**
^{14}C_{n}, Eq. 2) is the minimum uncertainty and is present in all the ^{14}C determinations. As this study is restricted to measurements on graphite cathodes, the extra uncertainty of the graphitization step, the second contribution, is automatically also incorporated in all measurements from background materials, secondary references (cat. 2) and sample duplicates (cat. 2). The third contribution from the CO_{2} extraction is visible in the combustion background materials wood (bgw) and collagen (bgc), in the individually combusted secondary references (cat. 3), and in the CO_{2} preparation sample duplicates (cat. 3, same chemical pretreatment, three different following steps). Finally, the fourth contribution, the uncertainty added in the chemical pretreatment, can be investigated with the pretreatment sample duplicates (cat. 4, where everything in the total process is different, see Figure 1). These four major contributions to the expanded uncertainty will be treated in the following texts.

As mentioned before, the first major contribution, the uncertainty in a ^{14}C measurement, is derived from the partial derivatives of **F**
^{14}C_{n} with respect to each of the variables (**dF**
^{14}C_{n}
**)**. Figure 2 shows the typical contribution of the uncertainty in each variable from a representative measurement batch to this calculated ^{14}C measurement uncertainty. The quadratic sum of those components results in the ^{14}C measurement uncertainty (line f, black).

The uncertainty in the (^{14}C/^{12}C)_{sample} is the statistical uncertainty (Poisson counting statistics) (line a, gray). The Poisson counting statistics is still the largest contribution to **dF**
^{14}C_{n}. For low ^{14}C activities, the uncertainty is dominated by the spread in the background materials (line c, green).

This calculated ^{14}C measurement uncertainty needs to be put to the test. We expect it to be a valid uncertainty for pure gas samples, but for samples requiring pretreatment some extra “dark” uncertainty probably plays a role.

The first thorough check on our calculated uncertainties is given by the long-term spread of our secondary references. Table 1 provides the summary statistics for those secondary references. The references with a graphitization step only (cat. 2), are Rommenhöller (background), IAEA-C8 (bulk), IAEA-C7 (bulk), GS-51 (bulk), and Oxalic Acid II (bulk). Table 1 contains both the external and internal standard deviations (for the calculation equations see Appendix 2), and also $\chi _{red}^2$ . The last column of Table 1 gives the probability that the difference between both standard deviations is significant (based on the statistics of the $\chi _{red}^2$ distribution).

* When the significant digit is between 1 and 4 an extra digit is shown.

** Recently a memory problem in the bulk combustion line was discovered to which IAEA-C7 and IAEA-C8 were vulnerable. This is the reason why the measured ^{14}C values of those cat. 2 secondary references are slightly different from the assigned values. For the purpose of this paper this has no further consequences.

*** For every batch the mean value of the Oxalic Acid II references is calibrated to become the assigned value of 134.066%. Therefore, its overall spread is not representative.

For four of the five graphitization references (cat. 2),
$\chi _{red}^2$
is <1, implying (with on average ≈ 85% probability) that the realized, external measurement uncertainty is somewhat smaller than the calculated, (internal) uncertainty. In other words, our calculated uncertainty (**dF**
^{14}C_{n}) might be slightly overestimated.

The contribution from combustion is quantified by the secondary references that are individually combusted (cat. 3). Those are also listed in Table 1, and they have
$\chi _{red}^2$
values somewhat larger than 1, indicating that the calculated **dF**
^{14}C_{n} is a slight underestimation, and that there is some “dark” uncertainty present. Oxalic acid has a
$\chi _{red}^2$
much smaller than 1. However, the spread of the Oxalic Acid II is not representative, because this is used as calibration material and therefore for every batch the mean value is calibrated to become the assigned value of 134.066%.

The background wood (bgw) and background collagen (bgc) samples were chemically pretreated in large quantities, but individually combusted. Therefore, this pretreatment cannot influence their spread and those background references can be used as CO_{2} preparation duplicates (cat. 3). For bgw, we indeed get a result comparable to the other cat. 3 materials. For background collagen, however, we observe the highest
$\chi _{red}^2$
of all materials: 1.7. The reason for this high, and significant value is not well understood and more data are needed as this value is calculated from only 12 data points.

Uncertainty estimates leads to maximum ages measurable in a system. The standard deviation of background collagen implies that the minimum **F**
^{14}C_{n} distinguishable from background values, is two times 0.05%, so 0.1% on graphite samples (Stuiver and Polach 1977; van der Plicht and Hogg Reference van der Plicht and Hogg2006). This corresponds to 55,000 years BP. However, as the absolute **F**
^{14}C_{n} values for the background wood and the background collagen are 0.23–0.25%, even though these materials are known to be of infinite age, we never report ages older than corresponding to these activities (48,000 years BP) (van der Plicht and Palstra Reference van der Plicht and Palstra2016).

Figure 3 visualizes the average results over the first full year of measurements from the secondary references. The calculated ^{14}C measurement uncertainty (from Eq. 2, **dF**
^{14}C_{n}, averaged data of last one and a half year), is shown in black again (line a). The long-term spread of our secondary CO_{2} references (cat. 2), contains contributions from the actual ^{14}C measurement and the graphitization, but no further variability due to individual combustion, and **dF**
^{14}C_{n} is expected to be a good estimate of their uncertainty (see above). The standard deviation of the cat. 2 references is shown in blue (line b). The realized spread in the long-term measurements is lower than line a, and, for higher **F**
^{14}C_{n} values, even approaching the Poisson statistics uncertainty (which is shown as the gray line (a) in Figure 2).

The standard deviation of cat. 3 secondary references is shown by line c (red) in Figure 3. Six combustion references from Table 1 (cat. 3, not oxalic acid) are used to construct this line. The displayed spread at zero percent **F**
^{14}C_{n}, is the spread of bgw. In practice there are many more bgw than bgc measurements, therefore bgc measurements are disregarded. The GS-35 measurements are disregarded in Figure 3 as well, as this material is a carbonate and therefore not combusted.

The differences between the external standard deviation of the secondary references, that have only a graphitization step (cat. 2), and the secondary references, that have both a graphitization and a combustion step (cat. 3), are significant (line b and c in Figure 3) and amount to ≈ 30%. As expected, the combustion process contributes to a greater spread in the data.
$\chi _{red}^2$
is the ratio of the squared external standard deviation and the squared ^{14}C measurement uncertainty (**dF**
^{14}C_{n}
^{2}
**)**. The average
$\chi _{red}^2$
for cat. 3 references is 1.3. To represent the uncertainty in these combusted references, our calculated (internal errors) **dF**
^{14}C_{n} (line a) need to be multiplied by ≈ 1.15 (which is the square root of the average
$\chi _{red}^2$
of the appropriate samples in Table 1).

The next source of information about contributions to the expanded uncertainty comes from sample duplicates in various phases of the process (see Figure 1). Data from air samples from our atmospheric station at Lutjewad (the station is described in Van der Laan-Luijkx et al. Reference van der Laan-Luijkx, Karstens, Steinbach, Gerbig, Sirignano, Neubert, van der Laan and Meijer2010) provide information about quantification of the graphitization uncertainty (cat. 2 duplicates). CO_{2} is dissolved in an alkaline solution during the sampling of atmospheric air and for ^{14}C measurement, this CO_{2} is released again by using acid. The released CO_{2} fraction is divided into three equal portions, which makes these samples graphitization triplicates. The average spread in these triplicates is 0.18%, which compares favorably to the calculated ^{14}C measurement uncertainty of 0.16%.

The unknown sample CO_{2} preparation duplicates (cat. 3) provide information about the third major contribution to the expanded uncertainty, the contribution from combustion. For the CO_{2} preparation duplicates of unknown samples, the standard deviation of the distribution of ƒ_{σ}, σ(ƒ_{σ}), is on average 1.4, meaning the **dF**
^{14}C_{n} had to be increased by 40% to match the spread (See Table 2). As mentioned before, when comparing secondary references from cat. 3 (c from graph 3) **dF**
^{14}C_{n} should be enlarged by approximately 15%. We attribute this large difference between sample duplicates and secondary references to inhomogeneity in the samples and connected to this inhomogeneity the chance of success in homogeneously removing exogenous contaminants in the samples. It illustrates that the CO_{2} preparation from inhomogeneous samples will contribute much more to the expanded uncertainty than the CO_{2} preparation from pure materials. If unaccounted for, this would lead to “dark uncertainty” in our results, which would come to light for example in intercomparisons. Our attempt to quantify this extra uncertainty is thus by randomly re-measuring samples on a regular basis.

All described secondary references do not require a chemical pretreatment, because the references are pure materials. Therefore, the fourth major contribution to the expanded uncertainty can only be quantified by sample pretreatment duplicates (cat. 4) (among which the known sample VIRI F, horse bone). Monitoring the pretreatment duplicates (cat. 4) revealed that the expanded uncertainty for unknown random samples (bones, charcoal, wood) is **dF**
^{14}C_{n}, increased by a factor of 1.6 (see Table 2). Splitting the pretreatment duplicates into different materials would be desirable but is impeded by the low number of measurements.

Interestingly, all the VIRI F Horse bone pretreatment duplicates (cat. 4) (paired in two) show a smaller σ(ƒ_{σ}) of 1.2, suggesting a lower expanded uncertainty. The VIRI F measurements were paired in duplicates to allow us a direct comparison with duplicates we do randomly on unknown samples. Since a data set of nine pairs is small, we also calculate ƒ_{σ} by comparing the standard deviation of all VIRI F measurements with the average calculated **dF**
^{14}C_{n}. The result is similar to the paired method (as it should). The reason for this smaller σ(ƒ_{σ}) is unknown; maybe this bone sample was less contaminated compared to other samples. The sample duplicates from tree-rings where the fraction α-cellulose was extracted (cat. 4), also showed a smaller σ(ƒ_{σ}) of 1.4 compared to various other unknown samples (of which σ(ƒ_{σ}) is 1.6 as mentioned earlier). The reason for this is probably that during pretreatment most of the contaminants and other naturally occurring compounds were removed, as the extracted α-cellulose is a more uniform biopolymer. As σ(ƒ_{σ}) of cat. 4 from more homogeneous materials hardly differs with the σ(ƒ_{σ}) of cat. 3 duplicates, it indicates that the additional sample handling from the chemical pretreatment does not contribute to the increase of the expanded uncertainty. The increase of σ(ƒ_{σ}) for cat. 4 pretreatment duplicate samples of 1.6 is, therefore probably merely due to inhomogeneity of the sample material and, perhaps related to that, the variability of the success rate of the chemical pretreatment. The expanded uncertainties for various processes and samples, using the calculated **dF**
^{14}C_{n} and the σ(ƒ_{σ}) factors are shown in Table 3. The use of σ(ƒ_{σ}) factors is in fact an error multiplier approach (Scott et al. Reference Scott, Cook and Naysmith2007). Of course, it would be preferable to also quantify and include all other uncertainty sources, but as these are next to impossible to determine, this pragmatic solution is acceptable. Still, multiplication factors should be as close to unity as possible, otherwise the uncertainty analysis apparently fails to include the major sources of uncertainty.

* α-cellulose pretreatment is an exception (see Table 2), for which we can use the data in the 4th column.

During the early phases of measuring with the MICADAS, a realistic determination of the expanded uncertainty was not possible due to limited available data and so to encompass uncertainties arising from various sources, we used a provisional multiplication factor of 1.5 for the ^{14}C measurement uncertainty for every sample. Our present assessment showed that this “educated guess” was quite appropriate.

### Attempts to Reduce the Expanded Uncertainty

The main goal of this study was to determine the expanded uncertainty of a ^{14}C measurement and to quantify the contributions to this uncertainty. As an extension of the project while performing and analyzing all the measurements that were described in this paper, we also tried to reduce this expanded uncertainty. One obvious possibility is to improve the counting statistics by increasing the measurement time. The Poisson uncertainty will of course gradually decrease by collecting more counts during a longer measurement time, but at the same time other uncertainty sources in the measurement (calibration stability, ^{13}C signal stability) might increase, and after a certain point will outbalance the Poisson gain.

We tried this out by performing an experiment where we measured a batch with a net measurement time exceeding 10,000 seconds per sample. The batch contained (among others) seven Oxalic Acid II references and eight IAEA-C8 references, all produced from CO_{2} from their respective bulk materials (thus cat. 2).

In this experiment with very long measurement times, we calculated the measurement standard deviation after 1700, 2300, 3500 until 10,000 seconds measurement time per sample. These results provided important insights into the optimum measurement time, but as the same cathodes were analyzed for the comparison, the data are not independent of each other.

The main conclusion is that in contrast to the Poisson uncertainty from individual references that obviously decreased with measurement time, the observed spread in the Oxalic Acid II references (a) did not significantly improve for measurement times over 4000 seconds. The random contribution to the spread due to the different graphitization reactions, causing spread in the ^{13}C stability is a plausible reason. In all cases, a longer measurement time obviously leads to a decrease in (the calculated) **dF**
^{14}C_{n}. On the other hand, the standard deviation of IAEA-C8 (b) still does decrease with time. However, the gain in years from 2400 sec measurement time (magenta line) to 4000 sec is only 5 years (BP) for IAEA-C8 (for Oxalic Acid II only 2 years BP). This improvement is hardly ever worth the investment of doubling of the measurement time. Therefore, our routine measurement time of 2400 seconds is optimally chosen.

We conducted several independent batches with various measurement times as well. The calculated ^{14}C measurement uncertainty (**dF**
^{14}C_{n}) for those independent measurements also revealed no significant improvement after 2400 seconds. All measurements in this paper were conducted at 2400 seconds.

Table 1 (comparison of secondary references from cat. 2 and 3) and Figure 3 (line b and c) showed the influence from the combustion process to the expanded uncertainty. A possible, although unlikely, cause might be a memory effect from one combustion to another in the combustion set up and cryogenic collection system. Experiments with blank (background material) combustions after Oxalic Acid II references did not show a significant memory effect in the combustion set up. Still, to be absolutely sure, we recently started to combust an empty tin capsule before every individual combustion to see whether we could more definitively understand and even determine the size of this potential memory effect. The extra oxygen pulse should reduce possible leftover material that was not completely converted into CO_{2}. Further data collection is needed to determine if the additional blank combustions improve the final reported uncertainty. The other contamination source could be the cryogenic collection system. This system is more than 20 years old, and the constant freezing and heating of the glass may have introduced micro cracks that function as active adsorption spots for the exchange of CO_{2} and thus increase the memory from sample to sample. Regular refreshing of the glass system may reduce this contamination and hence, the contribution to the expanded uncertainty. The replacement of all the glass components of the cryogenic collection system will be a major operation; therefore, further research will be needed to see, whether this will indeed reduce the contribution to the expanded uncertainty.

The contribution from chemical pretreatment is visible in Table 2 for cat. 4 sample duplicates, but also for cat. 3 sample duplicates, where the multiplication factor is considerably larger than for pure substances (Table 1, ~1.15). A large part is apparently due to the inhomogeneity or intractable contamination of the sample and therefore it is not easily possible to reduce this contribution. Obviously, running duplicate (or multiple) samples would help to some extent, as inhomogeneities and the varying success of removing contaminants would average out. However, for the vast majority of samples this is not an option due to the increased costs involved (and sometimes the need for more material). An experiment on the automation of pretreatment, as a means of standardizing the process, and reducing random errors will be investigated in the near future.

## DISCUSSIONS AND RECOMMENDATIONS

The new generation of high-yield accelerator mass spectrometers delivers very small measurement uncertainties, due to higher count rates. The reported uncertainty in the final outcome, however, must be the expanded uncertainty, a firm measure of the spread that can be expected in case of multiple analyses of the same material, by one or more laboratories.

When performing sample duplicates in the same laboratory, and monitoring the spread in the data, it is already clearly apparent that the ^{14}C measurement uncertainty is too small to serve as the reported uncertainty. Therefore, in the field of ^{14}C measurements it is very common to use a multiplying factor of the ^{14}C measurement error for the uncertainty in the final outcome (Scott et al. Reference Scott, Cook and Naysmith2007). This is in line with the “dark uncertainty” concept, well known from intercomparison of results between different laboratories (Koepke et al. Reference Koepke, Lafarge, Possolo and Toman2017; Merkatas et al. Reference Merkatas, Toman, Possolo and Schlamminger2019).

However, we deemed it necessary to achieve a better and more thorough understanding of the build-up of uncertainty in the whole chain from chemical pretreatment to the final measurement, such that we can report a reliable expanded uncertainty in our publications, and to our customers. To report such a reliable uncertainty is obviously very important for participation in round robin tests and other intercomparisons.

For laboratories in the field of ^{14}C, we recommend measuring full duplicates on all the kinds of samples that are normally measured; noting, of course, that most laboratories already have such protocols implemented. A nice example of such a practice was described in a recently published work by Sookdeo et al. (Reference Sookdeo, Kromer, Büntgen, Friedrich, Friedrich and Helle2019), where in addition to process duplicates, the authors also emphasize including process backgrounds for high-quality measurements. The results of the measured duplicates give a very good insight in the quality of the measurements. This protocol of measuring duplicates is especially useful when participating in intercomparisons. In the optimal case, where all participants estimate their expanded uncertainty well, the “dark uncertainty” would be minimal. Weighting the data for averaging with the expanded uncertainty would then make sense. Therefore, we recommend that in future intercomparisons a report should be added on the basis used for the stated uncertainty.

For quality improvement and thus reduced expanded uncertainty in the final outcome, it is recommended to measure secondary references in the various steps of the process from CO_{2} preparation up to the actual ^{14}C measurement. This gives insights into where improvements in quality can be achieved and which steps are limiting the further reduction of the expanded uncertainty. Using homogeneous materials has the advantage that the effects are clear. On the other hand, one should not claim the results of such homogeneous secondary references as valid for the real samples, as our work has shown that “real samples” show larger spread, most likely due to variability in the success of removing contamination in the pretreatment process.

## CONCLUSIONS

After detailed uncertainty analysis using measurements from secondary references and sample duplicates during the first year and a half of MICADAS operation in Groningen, we are confident that we can report an expanded uncertainty that is representative of the real uncertainty in our final ^{14}C measurements. This expanded uncertainty incorporates contributions from the chemical pretreatment, the CO_{2} preparation, the graphitization and the ^{14}C measurement. We systematically evaluate the contributions to the ^{14}C measurement uncertainty. This uncertainty is the basis for the expanded (final) uncertainty. As our work has shown, for samples, like bone, wood or charcoal, which undergo chemical pretreatment, combustion, graphitization, and ^{14}C measurement, the calculated ^{14}C measurement uncertainty must be multiplied by factor 1.6 to get the expanded uncertainty.

For more homogeneous samples, like a one-year tree ring sample where
$\alpha $
-cellulose is collected, this multiplication factor is 1.4. Similarly, for CO_{2} samples collected from air, this factor is 1.1.

The achievement of this present work is twofold: first that the we have checked our carefully calculated ^{14}C measurement uncertainty and shown that it is a reliable basis for reporting the final uncertainty, and second that we have established evidence-based multiplication factors for the various sample types. Future ring tests will benefit from this method of uncertainty estimate.

## ACKNOWLEDGMENTS

We would like to thank the staff of the Centre for Isotope Research in Groningen, Dicky van Zonneveld, Henk Been, Marc Bleeker, Fsaha Ghebru, Berthe Verstappen-Dumoulin, Sven de Bruijn, Regina Linker, Henk Jansen, Margot Kuitems, Janette Spriensma, and Patricia Wietzes for their work. Without them we could not have performed this study (or run a ^{14}C laboratory at all). We would like to thank Antonio Possolo (NIST) for pointing us to papers providing statistical background. We would also like to thank the two anonymous referees for their careful review and valuable suggestions for improvement.

## APPENDIX 1 Comparison of ^{14}C uncertainty calculation with MICADAS batch averages and calculations with four-monthly averages

During the first months of operation of the MICADAS, from September 2017 until July 2018, the calculation of the ^{14}C measurement uncertainty (Eq. 2, **dF**
^{14}C_{n}) was performed using the same methodology as with our former HV accelerator, namely based on daily or batch values for the error in the mean of the Oxalic Acid II references and the daily or batch values for the spread in the backgrounds. Table 4 provides the summary statistics for two background materials, namely background wood and Rommenhöller gas, for both situations (the data for the period after July 2018 is also shown in Table 1). From the data, it is apparent that the calculation of the ^{14}C measurement uncertainty before July 2018 for background materials gives an uncertainty that underestimates the spread of the data. The internal error is too small (by a factor of 3). After July 2018 the uncertainty calculation is based on four-monthly averaged values, instead of the daily (batch) values. The four-monthly average value of the background is more reliable as a typical background for the samples, and the spread in these data of the last four months, is more realistic in comparison with the spread from only four backgrounds in one batch. In other words, the variability between batches is larger than the variability within one batch.

The actual external standard deviations of the measurements are approximately the same for both periods, as they should be, as the actual measurements were performed in the same way before and after that date.

## APPENDIX 2 Calculations of weighted means, internal and external errors and $\chi _{red}^2$

To calculate the weighted average (
${\bar x_w}$
) of data from the secondary references, the weighting factor (
${w_i}$
is given by the reciprocal of the variance (that is the square of the calculated ^{14}C measurement uncertainty, **dF**
^{14}C_{n} of the ^{14}C measurement value (n is number of measurements and *i* represents a single measurement).

The weighted mean in this case is (where *x*
_{
i
} is a single measurement):

The external standard deviation is calculated according to:

The external error in the mean *σ*
_{
m,ext
} is calculated by dividing the external standard deviation by the square root of the number of measurements:

The internal error in the mean *σ*
_{
m,int
}, (also called standard error of the weighted mean (SEWM)) is given by:

Their ratio: