Introduction to this document

This document is a brief walkthrough of the methods and code used to generate the figures in our owl monkey pedigree sequencing paper. In this paper, we sequenced 30 owl monkey (Aotus nancymaae) individuals within 6 multi-generation pedigrees (14 trios) and calculated the mutation rate per generation (\(\mu_g\)) for this species. We found that there is a similar paternal age effect in owl monkeys as in humans, and that the underlying mutational machinery doesn’t seem to have changed much between owl monkeys, chimpanzees, and humans. Finally, we utilized a simple model of reproductive longevity to show that any observed variation in \(\mu_g\) between these species can be explained by the varying age of puberty and average age at reproduction.

For full descriptions see the paper:

Thomas GWC, Wang RJ, Puri A, Harris RA, Raveendran M, Hughes DST, Murali SC, Williams LE, Doddapaneni H, Muzny DM, Gibbs RA, Abee CR, Galinski MR, Worley KC, Rogers J, Radivojac P, Hahn MW. 2018. Reproductive longevity predicts mutation rates in primates. Current Biology. 28(19):3193-3197. 10.1016/j.cub.2018.08.050

To download this document, all accompanying code, and the data visit the following link: Owl monkey github repository


Figures 1 and S2

Owl monkey mutation rate

After filtering putative mutations (see Methods in paper) in the 14 owl monkey trios, we can calculate mutation rates (\(\mu_g\)) by taking into account false negative rates (\(\alpha\)) and the length of the callable genome (\(C\)) for each trio. Estimating a false negative rate allows us to correct the mutation rate for any true mutations that may have been removed by our filtering process. By using the distribution of allelic balance at all heterozygous SNP sites as an empirical CDF, we estimate \(\alpha = 0.44\) while removing mutations with allelic balance less than 0.4 or or greater than 0.6 (henceforth noted as [0.4,0.6]; see FigS1 and Methods in paper). We estimate \(C\) for each trio by counting the number of sites that pass the MV filters in each trio. Then, with the detected number of de novo mutations after filtering (\(m_g\)), the mutation rate for each trio is calculated as follows:

Equation 1: \[ \mu_g = \frac{m_g}{(1 - \alpha)*(2 * C)} \]

Mutation rates for all 14 trios were calculated in this fashion, with allelic balance cut-offs of [0.4,0.6] and the calculated value for \(\alpha\). This information is available in Table S3 of our paper.

in_data = read.csv("data/owl-monkey-filters.csv", header=TRUE)
cur_filter = subset(in_data, Min.DP==20 & Max.DP==60 & Min.AB==0.4 & Max.AB==0.6)
# Read data and define current filter

callable_sites = cur_filter$Sites - cur_filter$Uncallable.SNP.sites
mu_calc = cur_filter$MVs / ((1 - cur_filter$Min.FN) * (2 * callable_sites))
cur_filter$CpG.rate = cur_filter$CpG.MVs / (2 * callable_sites)
mu_table = data.frame(cur_filter$Trio, cur_filter$MVs, cur_filter$CpG.MVs, callable_sites, mu_calc)
names(mu_table) = c("Trio", "Mutations", "CpG Mutations", "Callable sites", "Mutation rate")
# Re-calculate the mutation rate for posterity and subset the data for the table.

kable(mu_table, "html", caption="Table 1: Mutation rates for 14 owl monkey trios", digits=11, align=rep('c', 5)) %>%
  kable_styling(bootstrap_options="striped", full_width=F)
Table 1: Mutation rates for 14 owl monkey trios
Trio Mutations CpG Mutations Callable sites Mutation rate
1 19 3 2209943580 7.640e-09
2 17 1 2208611258 6.840e-09
3 12 0 2204935186 4.840e-09
4 20 1 2206292059 8.060e-09
5 31 2 2205237095 1.250e-08
6 16 1 2211771942 6.430e-09
7 30 4 2208823079 1.208e-08
8 37 7 2211217581 1.488e-08
9 14 0 2214425687 5.620e-09
10 20 3 2214171927 8.030e-09
11 20 4 2205884529 8.060e-09
12 19 4 2205628589 7.660e-09
13 19 3 2201248359 7.670e-09
14 9 1 2198415883 3.640e-09

We want to know if these mutation rates are correlated with father’s age, similar to the paternal-age effect seen in humans. To answer this question, we perform a simple linear regression on mutation rates with father’s age at birth of the offspring (\(A_M\); purple dots and line). Additionally, we perform a similar analysis on non-replicative mutations and CpG sites (blue dots and line).

om_obs_reg = lm(mu_calc ~ cur_filter$Paternal.GT)
fig1 = ggplot(cur_filter, aes(x=Paternal.GT, y=Mutation.rate.corrected.for.FN)) + 
  geom_smooth(method='glm', color='#b66dff', fill="#f2e6ff", fullrange=T) +
  geom_point(color='#b66dff', size=3) +
  geom_smooth(aes(x=Paternal.GT, y=CpG.rate), method='glm', color="#6db6ff", fill=NA, fullrange=T) +
  geom_point(aes(x=Paternal.GT, y=CpG.rate), color="#6db6ff", size=3) +
  scale_x_continuous(breaks = seq(1, 15, by=2)) + 
  scale_y_continuous(breaks = seq(0, 1.8e-8, by=0.4e-8)) + 
  scale_color_manual(breaks=c("All mutations (observed)", "CpG mutations (observed)"), values=c("All mutations        (observed)"='#b66dff', "CpG mutations (observed)"="#6db6ff")) + 
  geom_vline(xintercept=1, linetype=3, color="grey", size=0.75) + 
  labs(x="Age of father at conception (y)",y="Mutation rate per site\nper generation") +
  theme_classic() +
  theme(axis.text=element_text(size=14), 
        axis.title=element_text(size=20), 
        axis.title.y=element_text(margin=margin(t=0,r=20,b=0,l=0),color="black"), 
        axis.title.x=element_text(margin=margin(t=20,r=0,b=0,l=0),color="black"),
        axis.line=element_line(colour='#595959',size=0.75),
        axis.ticks=element_line(colour="#595959",size = 1),
        axis.ticks.length=unit(0.2,"cm")
        #legend.position=c(0.1,0.87),
        #legend.text=element_text(size=18)
        )
print(fig1)

This results in the following formula for a best fit line (purple):

Equation 2: \[ \mu_g = 3.74\text{e-}9 + (A_M * 6.62\text{e-}10) \]

This means that with an average genome size of \(2.21\) billion base pairs that \(19.45\) mutations are accounted for by puberty at the age of 1, with an additional \(2.92\) mutations per year of the father’s life after puberty. As expected, non-replicative mutations are independent of father’s age (blue line).

Modeling mutation rates

Large-scale pedigree sequencing projects in humans1–10 have shown the importance of different life-stages in the determination of mutation rates. Models for predicting mutation rates generally account for the three important life stages in the mammalian germline11,12. These life stages are (1) female (F), (2) male before puberty (M0), (3) and male after puberty (M1). The relative contribution of each of these stages must be accounted for when estimating mutation rates per generation12 or per year11–13. Here, we re-frame this model in terms of reproductive longevity. Reproductive longevity depends on both the age of puberty in males (\(P_M\)) and the age of the male at conception of his offspring (\(A_M\)) and we find that it is the main determinant of mutation rate variation in primates. We define the value of reproductive longevity (\(RL\)) as:

Equation 3: \[ RL = (A_M - P_M) \]

\(RL\) therefore measures the amount of time mutations have accumulated post-puberty in a male, which only occurs during stage M1.

To see how reproductive longevity affects the per-generation mutation rate, \(\mu_g\), we must model the combined contribution from all life stages. In any given period of time, \(t\), the mutation rate due to errors in DNA replication, \(\mu_t\), is simply a product of the mutation rate per cell division, \(\mu_c\), and the number of cell divisions that occur, \(d_t\).

Equation 4: \[\mu_g = \mu_c * d_t \]

Since females (stage F) and pre-puberty males (stage M0) have a fixed number of cell divisions, their contribution to the mutation rate per-generation is constant and requires only the substitution of appropriate terms into equation 4.

Equation 5: Female contribution to \(\mu_g\) \[ \mu_{gF} = \mu_c * d_F \] Equation 6: Pre-puberty male contribution to \(\mu_g\) \[ \mu_{gM0} = \mu_c * d_{M0} \]

However, in males after puberty (stage M1) the number of cell divisions is a linear function of time, and the mutation rate per-generation in this life stage therefore depends on the yearly rate of cell division (\(d_{yM1}\)) and reproductive longevity (\(RL\)).

Equation 7: Post-puberty male contriubtion to \(\mu_g\) \[ \mu_{gM1} = \mu_c * d_{yM1} * RL \]

Finally, since an autosome will spend roughly half of its time in females and half in males, the mutation rate per generation (\(\mu_g\)) for a given species is the average of the male and female contributions.

Equation 8: \[ \mu_g = \frac{\mu_{gF} + (\mu_{gM0} + \mu_{gM1})}{2} \]

Estimating mutational parameters from humans

Empirical observations from developmental studies and large-scale pedigree data from humans inform us about some of the underlying mutational parameters of our model (equations 5, 6, and 7). For example, we use \(31\) and \(34\) as estimates for the number of cell divisions in human females (\(d_F\)) and males before puberty (\(d_{M0}\))14. We use \(16\) days as the length of a single spermatogenic cycle (\(t_{sc}\))15, which means we expect \(d_{yM1} = 23\) spermatogenic cycles to occur in a year if all spermatagonial cells are constantly dividing (but see below).

The remaining parameter of the model, \(\mu_c\), can be estimated from human pedigrees. We confirm the estimate of \(\mu_c\) made by Amster and Sella13 by using the \(\mu_{gF}\) observed in Kong et al.1 of \(14.2\), the number of female germline divisions, and rearranging equation 5:

Equation 9: \[ \mu_c = \frac{14.2}{31} \]

human_dm0 = 34; human_df = 31; human_num_f = 14.2; human_haploid = 2630000000;
# Observed human mutational parameters
human_mu_c = (human_num_f / human_df)
print(paste("mu_c estimated from Kong params: ", human_mu_c))
## [1] "mu_c estimated from Kong params:  0.458064516129032"
human_mu_c = human_mu_c / human_haploid
print(paste("mu_c per site estimated from Kong params: ", human_mu_c))
## [1] "mu_c per site estimated from Kong params:  1.74169017539556e-10"

We assume this rate is the same between females, males before puberty, and males after puberty (though see Methods of paper for discussion on this point). Therefore, to accomodate the observed \(2.01\) mutations per year of the father (\(\mu_{yM1}\))1, we must use the calculated \(\mu_c\) to make adjustments to the expected number of spermatogenic cycles. This stems from the observation that not all spermatagonial cells may be constantly dividing16,17 (see Methods of paper for full explanation).

Equation 10: \[ dy_{M1}^{Human} = \frac{\mu_{yM1}}{\mu_c} \ = \frac{2.01}{0.458} \]

human_sc = 16
human_dy1 = ((2.01 / human_haploid) / human_mu_c)
human_sp = human_dy1 / (365/16)
print(paste("Human expected # of spermatogenic cell divisions/year: ", human_dy1))
## [1] "Human expected # of spermatogenic cell divisions/year:  4.38802816901408"
print(paste("Human proportion of cells actively dividing given a cycle length of 16 days: ", human_sp))
## [1] "Human proportion of cells actively dividing given a cycle length of 16 days:  0.192351919737604"

It has been observed the New World monkeys have a lower rate of spermatogenesis than humans18, with \(t_{sc}^{Owl monkey} = 10.2\) days, approximately. We use the expected proportion of cells actively dividing in humans (\(p_{sc}^{Human} = 0.19\)) to estimate the expected number of spermatogenic cycles per year in owl monkeys.

Equation 11: \[ dy_{M1}^{Owl monkey} = (\frac{365}{t_{sc}^{Owl monkey}}) * p_{sc}^{Human} \]

om_sc = 10.2
om_dy1 = (365/om_sc) * human_sp
print(paste("Owl monkey expected # of spermatogenic cell divisions/year: ", om_dy1))
## [1] "Owl monkey expected # of spermatogenic cell divisions/year:  6.88318144159072"

The mutational model fits observed owl monkey data

We can use the mutational parameters estimated from human data above, along with \(t_{sc}^{Owl monkey}\) and an age of puberty of \(1\) year19 to predict mutation rates for a range of paternal ages (dashed purple line). What follows is an exact replication of Figure 1 in our paper.

om_puberty = 1
age_range = seq(om_puberty, 15, by=0.1)
rep_long = age_range - om_puberty
# The owl monkey age of puberty set at 1 year and a range of paternal ages

mu_gf = human_mu_c * human_df
# Equation 5

mu_gm0 = human_mu_c * human_dm0
# Equation 6

mu_gm1 = human_mu_c * om_dy1 * rep_long
# Equation 7

mu_g = (mu_gf + (mu_gm0 + mu_gm1)) / 2
# Equation 8

om_pred = data.frame(mu_g, age_range)
names(om_pred) = c("Mutation.rate", "Parental.age")

om_pred_reg = lm(mu_g ~ age_range)
fig1_final = fig1 + geom_line(data=om_pred, aes(x=Parental.age, y=Mutation.rate), linetype=2, size=0.75, color="#b66dff")
print(fig1_final)

The fit of our model predicts \(3.15\) mutations per year of the father after puberty, compared to the observed \(2.92\). An F-test on the residuals of the observed fit (solid purple line) and predicted fit (dashed purple line) shows that they are not significantly different.

cur_mu = mu_calc
cur_gt = cur_filter$Paternal.GT
#plot(cur_gt,cur_mu,ylim=c(min(cur_mu),max(cur_mu)))

#abline(om_obs_reg)
#segments(x0=cur_gt,y0=cur_mu,x1=cur_gt,y1=cur_mu-resid(om_obs_reg))

pred_actual = coef(om_pred_reg)[1] + cur_gt * coef(om_pred_reg)[2]
pred_resids = cur_mu - pred_actual
#abline(om_pred_reg, lty=2)
#segments(x0=cur_gt,y0=cur_mu,x1=cur_gt,y1=cur_mu-pred_resids, lty=2)

# This is the F-test for differences between the two lines in Fig.1
print(var.test(resid(om_obs_reg), pred_resids, alternative="two.sided"))
## 
##  F test to compare two variances
## 
## data:  resid(om_obs_reg) and pred_resids
## F = 0.99605, num df = 13, denom df = 13, p-value = 0.9944
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.3197569 3.1027444
## sample estimates:
## ratio of variances 
##          0.9960542

Changing the allelic balance filter has little effect on the fit of the model

By using a less stringent allelic balance filter of [0.3,0.7], we do reduce the false negative rate to \(\alpha = 0.29\), however we find that this has little effect on our fit. Though we observe \(2.63\) mutations per year of the father, a slightly lower estimate than with the more stringent filter. What follows is an exact replication of Figure S2.

cur_filter_s2 = subset(in_data, Min.DP==20 & Max.DP==60 & Min.AB==0.3 & Max.AB==0.7)
# Read data and define current filter

callable_sites_s2 = cur_filter_s2$Sites - cur_filter_s2$Uncallable.SNP.sites
mu_calc_s2 = cur_filter$MVs / ((1 - cur_filter_s2$Min.FN) * (2 * callable_sites_s2))
cur_filter_s2$CpG.rate = cur_filter_s2$CpG.MVs / (2 * callable_sites_s2)

figS2 = ggplot(cur_filter_s2, aes(x=Paternal.GT, y=Mutation.rate.corrected.for.FN)) + 
  geom_smooth(method='glm', color='#b66dff', fill="#f2e6ff", fullrange=T) +
  geom_point(color='#b66dff', size=3) +
  geom_smooth(aes(x=Paternal.GT, y=CpG.rate), method='glm', color="#6db6ff", fill=NA, fullrange=T) +
  geom_point(aes(x=Paternal.GT, y=CpG.rate), color="#6db6ff", size=3) +
  geom_line(data=om_pred, aes(x=Parental.age, y=Mutation.rate), linetype=2, size=0.75, color="#b66dff") +
  scale_x_continuous(breaks = seq(1, 15, by=2)) + 
  scale_y_continuous(breaks = seq(0, 1.8e-8, by=0.4e-8)) + 
  scale_color_manual(breaks=c("All mutations (observed)", "CpG mutations (observed)"), values=c("All mutations        (observed)"='#b66dff', "CpG mutations (observed)"="#6db6ff")) + 
  geom_vline(xintercept=1, linetype=3, color="grey", size=0.75) + 
  labs(x="Age of father at conception (y)",y="Mutation rate per site\nper generation") +
  theme_classic() +
  theme(axis.text=element_text(size=14), 
        axis.title=element_text(size=20), 
        axis.title.y=element_text(margin=margin(t=0,r=20,b=0,l=0),color="black"), 
        axis.title.x=element_text(margin=margin(t=20,r=0,b=0,l=0),color="black"),
        axis.line=element_line(colour='#595959',size=0.75),
        axis.ticks=element_line(colour="#595959",size = 1),
        axis.ticks.length=unit(0.2,"cm")
        #legend.position=c(0.1,0.87),
        #legend.text=element_text(size=18)
        )
print(figS2)

om_obs_reg = lm(mu_calc_s2 ~ cur_filter_s2$Paternal.GT)

cur_mu = mu_calc_s2
cur_gt = cur_filter_s2$Paternal.GT
#plot(cur_gt,cur_mu,ylim=c(min(cur_mu),max(cur_mu)))

#abline(om_obs_reg)
#segments(x0=cur_gt,y0=cur_mu,x1=cur_gt,y1=cur_mu-resid(om_obs_reg))

pred_actual = coef(om_pred_reg)[1] + cur_gt * coef(om_pred_reg)[2]
pred_resids = cur_mu - pred_actual
#abline(om_pred_reg, lty=2)
#segments(x0=cur_gt,y0=cur_mu,x1=cur_gt,y1=cur_mu-pred_resids, lty=2)

# This is the F-test for differences between the two lines in Fig.1
print(var.test(resid(om_obs_reg), pred_resids, alternative="two.sided"))
## 
##  F test to compare two variances
## 
## data:  resid(om_obs_reg) and pred_resids
## F = 0.99162, num df = 13, denom df = 13, p-value = 0.9881
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.318333 3.088928
## sample estimates:
## ratio of variances 
##          0.9916187

Figure 2

Primate mutational spectra

Based on the current study in owl monkeys, four studies in humans1,5,7,8, and one study in chimpanzees20, we can compare the mutational spectra across primates. What follows is Figure 2 from the paper.

spectra_data = read.csv("data/mutational-spectra.csv", header=TRUE)
om_spec = data.frame(spectra_data$Mutation, spectra_data$Owl.monkey.CpG.proportion, spectra_data$Owl.monkey.proportion)
names(om_spec) = c("Type", "CpG", "Non-CpG")
om_spec_m = melt(om_spec)

fig2_om = ggplot(om_spec_m, aes(Type, value, fill = variable)) + 
  geom_bar(stat = "identity") +
  scale_fill_manual("legend", values = c("Non-CpG"="#006ddb","CpG"="#db6d00")) +
  ggtitle("Owl monkey") + 
  labs(x="",y="Proportion of mutations") +
  theme_classic() +
  theme(axis.text=element_text(size=14), 
        axis.title.y=element_text(size=14), 
        axis.line=element_line(colour='#595959',size=0.75),
        axis.ticks=element_line(colour="#595959",size = 1),
        axis.ticks.length=unit(0.2,"cm"),
        legend.title=element_blank(),
        legend.position=c(0.1,0.87)
  )

human_spec = data.frame(spectra_data$Mutation, spectra_data$Human.average.proportion)
names(human_spec) = c("Type", "Human average")
human_spec_m = melt(human_spec)

fig2_human = ggplot(human_spec_m, aes(Type, value, fill = variable)) + 
  geom_bar(stat = "identity") +
  scale_fill_manual("legend", values = c("Human average" = "#920000")) +
  ggtitle("Human") + 
  labs(x="",y="Proportion of mutations") +
  theme_classic() +
  theme(axis.text=element_text(size=14), 
        axis.title.y=element_text(size=14), 
        axis.line=element_line(colour='#595959',size=0.75),
        axis.ticks=element_line(colour="#595959",size = 1),
        axis.ticks.length=unit(0.2,"cm"),
        legend.position="none"
  )

ch_spec = data.frame(spectra_data$Mutation, spectra_data$Chimp.proportion)
names(ch_spec) = c("Type", "Proportion")
ch_spec_m = melt(ch_spec)

fig2_chimp = ggplot(ch_spec_m, aes(Type, value, fill = variable)) + 
  geom_bar(stat = "identity") +
  scale_fill_manual("legend", values = c("Proportion" = "#920000")) +
  ggtitle("Chimpanzee") + 
  labs(x="",y="") +
  theme_classic() +
  theme(axis.text=element_text(size=14), 
        axis.title.y=element_text(size=14), 
        axis.line=element_line(colour='#595959',size=0.75),
        axis.ticks=element_line(colour="#595959",size = 1),
        axis.ticks.length=unit(0.2,"cm"),
        legend.position="none"
  )

g1 = arrangeGrob(fig2_human,fig2_chimp,nrow=1)
fig2 = plot_grid(fig2_om,g1, nrow=2, rel_heights=c(2, 1))
print(fig2)

We observe strikingly similar mutational spectra between primates. The only notable difference is a small, but significant increase in the number of A>T mutations in owl monkeys compared to humans. However, once the A>T class is removed, there is no significant difference between these species. This means that we see no evidence that the mutational machinery has changed drastically between these species.

print("--Chi-squared test using all mutation categories--")
## [1] "--Chi-squared test using all mutation categories--"
all_chi = chisq.test(spectra_data$Owl.monkey+spectra_data$Owl.monkey.CpG, p=spectra_data$Human.average.proportion)
print(all_chi)
## 
##  Chi-squared test for given probabilities
## 
## data:  spectra_data$Owl.monkey + spectra_data$Owl.monkey.CpG
## X-squared = 25.739, df = 5, p-value = 0.0001003
print("--Chi-squared test without the A>T category--")
## [1] "--Chi-squared test without the A>T category--"
human_noAT = spectra_data$Human.average.proportion[spectra_data$Mutation!="A>T"]
human_noAT = human_noAT / sum(human_noAT)
om_noAT = spectra_data$Owl.monkey[spectra_data$Mutation!="A>T"] + spectra_data$Owl.monkey.CpG[spectra_data$Mutation!="A>T"]
noAT_chi = chisq.test(om_noAT, p=human_noAT)
print(noAT_chi)
## 
##  Chi-squared test for given probabilities
## 
## data:  om_noAT
## X-squared = 5.8709, df = 4, p-value = 0.209

Figures 3 and S3

Modeling mutation functions in primates using human parameters

We then wanted to know how well the model fits other species with minimal changes to the underlying parameters. We used all parameters estimated from human data above (\(d_F\), \(d_{M0}\), \(d_{yM1}\) \(\mu_c\)) and varied only the age of puberty (\(P_M\)) to predict linear mutation functions for owl monkey and chimpanzee. For owl monkey we use \(P_M = 1\) year19, for chimp we used \(P_M = 7.5\) years21, and for humans we set \(P_M = 13.4\) years22. We then predicted mutation rates based on varying the duration of reproductive longevity by changing the male age of reproduction (lines). We compared these predictions to direct observations of mutation rates from studies done in humans1–3,5,6,8–10, chimpanzees20,23, and owl monkeys (points). What follows is an exact replication of Figure 3 from the paper.

human_puberty = 13.4
human_range = seq(human_puberty, 50.4, by=0.1)
human_rl = human_range - human_puberty

mu_gf = human_mu_c * human_df
# Equation 5

mu_gm0 = human_mu_c * human_dm0
# Equation 6

mu_gm1 = human_mu_c * human_dy1 * human_rl
# Equation 7

human_mu_g = (mu_gf + (mu_gm0 + mu_gm1)) / 2
# Equation 8

human_pred = data.frame(human_mu_g, human_range)
names(human_pred) = c("Mutation.rate", "Parental.age")
# The human predictions
#####
om_puberty = 1
om_range = seq(om_puberty, 16, by=0.1)
om_rl = om_range - om_puberty

mu_gf = human_mu_c * human_df
# Equation 5

mu_gm0 = human_mu_c * human_dm0
# Equation 6

mu_gm1 = human_mu_c * human_dy1 * om_rl
# Equation 7

om_mu_g = (mu_gf + (mu_gm0 + mu_gm1)) / 2
# Equation 8

om_pred = data.frame(om_mu_g, om_range)
names(om_pred) = c("Mutation.rate", "Parental.age")
# The owl monkey predictions
#####
chimp_puberty = 7.5
chimp_range = seq(chimp_puberty, 35.5, by=0.1)
chimp_rl = chimp_range - chimp_puberty

mu_gf = human_mu_c * human_df
# Equation 5

mu_gm0 = human_mu_c * human_dm0
# Equation 6

mu_gm1 = human_mu_c * human_dy1 * chimp_rl
# Equation 7

chimp_mu_g = (mu_gf + (mu_gm0 + mu_gm1)) / 2
# Equation 8

chimp_pred = data.frame(chimp_mu_g, chimp_range)
names(chimp_pred) = c("Mutation.rate", "Parental.age")
# The chimpanzee predictions
#####

study_data = read.csv("data/mutation-studies.csv", header=TRUE)
# Load the study data

fig3 = ggplot(study_data, aes(x=Paternal.age, y=Mutation.rate, color=Species)) +
    geom_line(data=human_pred, aes(x=Parental.age, y=Mutation.rate), color='#db6d00', size=1) +
    geom_segment(aes(x=human_puberty,y=human_mu_g[1]-human_mu_g[1]/10,xend=human_puberty,yend=human_mu_g[1]+human_mu_g[1]/10), linetype=1, color="#db6d00", size=0.5) +
    geom_line(data=om_pred, aes(x=Parental.age, y=Mutation.rate), color='#b66dff', size=1) +
    geom_segment(aes(x=om_puberty,y=om_mu_g[1]-om_mu_g[1]/10,xend=om_puberty,yend=om_mu_g[1]+om_mu_g[1]/10), linetype=1, color="#b66dff", size=0.5) +
    geom_line(data=chimp_pred, aes(x=Parental.age, y=Mutation.rate), color='#920000', size=1) +
    geom_segment(aes(x=chimp_puberty,y=chimp_mu_g[1]-chimp_mu_g[1]/10,xend=chimp_puberty,yend=chimp_mu_g[1]+chimp_mu_g[1]/10), linetype=1, color="#920000", size=0.5) +
    geom_point(size=3) +
    scale_x_continuous(limits=c(0,50), breaks = seq(1, 50, by=5)) + 
    labs(x="Paternal age (y)",y="Mutation rate per site\nper generation") +
    theme_classic() +
    guides(colour = guide_legend(override.aes = list(size=3))) +
    theme(axis.text=element_text(size=14), 
          axis.title=element_text(size=20), 
          axis.title.y=element_text(margin=margin(t=0,r=20,b=0,l=0),color="black"), 
          axis.title.x=element_text(margin=margin(t=20,r=0,b=0,l=0),color="black"),
          axis.line=element_line(colour='#595959',size=0.75),
          axis.ticks=element_line(colour="#595959",size = 1),
          axis.ticks.length=unit(0.2,"cm"),
          legend.title=element_blank(),
          legend.position=c(0.13,0.87),
          legend.text=element_text(size=18)
    ) +
    scale_color_manual(breaks=c("Owl monkey", "Chimpanzee", "Human"), values=c("Human"='#db6d00',"Owl monkey"='#b66dff',"Chimpanzee"='#920000'))

print(fig3)

Astoundingly, our predictions with this simple model of reproductive longevity (lines) fit the observed data extremely well (points). In fact, for the human data, the prediction is not significantly different than the best fit line to the study data.

# This is the F-test for differences between the predicted and observed human lines in Fig.3
human_data = data.frame(study_data$Paternal.age[study_data$Species=="Human"], study_data$Mutation.rate[study_data$Species=="Human"])
names(human_data) = c("Age","Mu")
human_data = na.omit(human_data)

human_study_reg = lm(human_data$Mu ~ human_data$Age)
human_pred_reg = lm(human_pred$Mutation.rate ~ human_pred$Parental.age)

pred_actual = coef(human_pred_reg)[1] + human_data$Age * coef(human_pred_reg)[2]
pred_resids = human_data$Mu - pred_actual
#plot(human_pred$Parental.age,human_pred$Mutation.rate, type='l',ylim=c(0.8e-8,1.8e-8), xlim=c(13.4,50))
#points(human_data$Age, human_data$Mu)
#segments(x0=human_data$Age,y0=human_data$Mu,x1=human_data$Age,y1=human_data$Mu-pred_resids)
#abline(human_study_reg, lty=2)
#segments(x0=human_data$Age,y0=human_data$Mu,x1=human_data$Age,y1=human_data$Mu-resid(human_study_reg),lty=2)
print(var.test(resid(human_study_reg), pred_resids, alternative="two.sided"))
## 
##  F test to compare two variances
## 
## data:  resid(human_study_reg) and pred_resids
## F = 0.93769, num df = 8, denom df = 8, p-value = 0.9297
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.2115133 4.1570385
## sample estimates:
## ratio of variances 
##          0.9376934

Using species specific rates of spermatogenesis does not affect the fit of our model

Like we did in Figure 1, we can adjust the slope of the line (\(d_{yM1}\)) by using species specific rates of spermatogenesis (\(t_{sc}\)). We’ve previously calculated this for humans and owl monkeys. For chimpanzees, we use set \(t_{sc} = 14\) days24.

chimp_sc = 14
chimp_dy1 = (365/chimp_sc) * human_sp
print(paste("Chimpanzee expected # of spermatogenic cell divisions/year: ", chimp_dy1))
## [1] "Chimpanzee expected # of spermatogenic cell divisions/year:  5.0148893360161"

And adjusting these slopes results in no significant change from the fit of our model. What follows is an exact replication of Figure S3 in our paper.

#####
om_puberty = 1
om_range = seq(om_puberty, 16, by=0.1)
om_rl = om_range - om_puberty

mu_gf = human_mu_c * human_df
# Equation 5

mu_gm0 = human_mu_c * human_dm0
# Equation 6

mu_gm1 = human_mu_c * om_dy1 * om_rl
# Equation 7

om_mu_g = (mu_gf + (mu_gm0 + mu_gm1)) / 2
# Equation 8

om_pred = data.frame(om_mu_g, om_range)
names(om_pred) = c("Mutation.rate", "Parental.age")
# The owl monkey predictions
#####
chimp_puberty = 7.5
chimp_range = seq(chimp_puberty, 35.5, by=0.1)
chimp_rl = chimp_range - chimp_puberty

mu_gf = human_mu_c * human_df
# Equation 5

mu_gm0 = human_mu_c * human_dm0
# Equation 6

mu_gm1 = human_mu_c * chimp_dy1 * chimp_rl
# Equation 7

chimp_mu_g = (mu_gf + (mu_gm0 + mu_gm1)) / 2
# Equation 8

chimp_pred = data.frame(chimp_mu_g, chimp_range)
names(chimp_pred) = c("Mutation.rate", "Parental.age")
# The chimpanzee predictions
#####

study_data = read.csv("data/mutation-studies.csv", header=TRUE)
# Load the study data

figS3 = ggplot(study_data, aes(x=Paternal.age, y=Mutation.rate, color=Species)) +
    geom_line(data=human_pred, aes(x=Parental.age, y=Mutation.rate), color='#db6d00', size=1) +
    geom_segment(aes(x=human_puberty,y=human_mu_g[1]-human_mu_g[1]/10,xend=human_puberty,yend=human_mu_g[1]+human_mu_g[1]/10), linetype=1, color="#db6d00", size=0.5) +
    geom_line(data=om_pred, aes(x=Parental.age, y=Mutation.rate), color='#b66dff', size=1) +
    geom_segment(aes(x=om_puberty,y=om_mu_g[1]-om_mu_g[1]/10,xend=om_puberty,yend=om_mu_g[1]+om_mu_g[1]/10), linetype=1, color="#b66dff", size=0.5) +
    geom_line(data=chimp_pred, aes(x=Parental.age, y=Mutation.rate), color='#920000', size=1) +
    geom_segment(aes(x=chimp_puberty,y=chimp_mu_g[1]-chimp_mu_g[1]/10,xend=chimp_puberty,yend=chimp_mu_g[1]+chimp_mu_g[1]/10), linetype=1, color="#920000", size=0.5) +
    geom_point(size=3) +
    scale_x_continuous(limits=c(0,50), breaks = seq(1, 50, by=5)) + 
    labs(x="Paternal age (y)",y="Mutation rate per site\nper generation") +
    theme_classic() +
    guides(colour = guide_legend(override.aes = list(size=3))) +
    theme(axis.text=element_text(size=14), 
          axis.title=element_text(size=20), 
          axis.title.y=element_text(margin=margin(t=0,r=20,b=0,l=0),color="black"), 
          axis.title.x=element_text(margin=margin(t=20,r=0,b=0,l=0),color="black"),
          axis.line=element_line(colour='#595959',size=0.75),
          axis.ticks=element_line(colour="#595959",size = 1),
          axis.ticks.length=unit(0.2,"cm"),
          legend.title=element_blank(),
          legend.position=c(0.13,0.87),
          legend.text=element_text(size=18)
    ) +
    scale_color_manual(breaks=c("Owl monkey", "Chimpanzee", "Human"), values=c("Human"='#db6d00',"Owl monkey"='#b66dff',"Chimpanzee"='#920000'))

print(figS3)

# This is the F-test for differences between the predicted and observed human lines in Fig.3
human_data = data.frame(study_data$Paternal.age[study_data$Species=="Human"], study_data$Mutation.rate[study_data$Species=="Human"])
names(human_data) = c("Age","Mu")
human_data = na.omit(human_data)

human_study_reg = lm(human_data$Mu ~ human_data$Age)
human_pred_reg = lm(human_pred$Mutation.rate ~ human_pred$Parental.age)

pred_actual = coef(human_pred_reg)[1] + human_data$Age * coef(human_pred_reg)[2]
pred_resids = human_data$Mu - pred_actual
#plot(human_pred$Parental.age,human_pred$Mutation.rate, type='l',ylim=c(0.8e-8,1.8e-8), xlim=c(13.4,50))
#points(human_data$Age, human_data$Mu)
#segments(x0=human_data$Age,y0=human_data$Mu,x1=human_data$Age,y1=human_data$Mu-pred_resids)
#abline(human_study_reg, lty=2)
#segments(x0=human_data$Age,y0=human_data$Mu,x1=human_data$Age,y1=human_data$Mu-resid(human_study_reg),lty=2)
print(var.test(resid(human_study_reg), pred_resids, alternative="two.sided"))
## 
##  F test to compare two variances
## 
## data:  resid(human_study_reg) and pred_resids
## F = 0.93769, num df = 8, denom df = 8, p-value = 0.9297
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.2115133 4.1570385
## sample estimates:
## ratio of variances 
##          0.9376934

Figure S4

Modeling mutation rates per year

To compare rates between species, mutation rates should be converted to an absolute time-scale: the mutation rate per year (\(\mu_y\)). This is useful when comparing long-term substitution rates (\(k\)) between species since the neutral mutation rate and the neutral substitution rate are theoretically equivalent (\(\mu_g = k\)). To calculate \(\mu_y\) we again average the mutational contributions from the three germline stages, but weight the average by the duration of time spent in each period: the duration of time spent in females (\(F\)), the duration of time spent in males before puberty (\(M_0\)), and the duration of time spent in males after puberty, previously defined as reproductive longevity (\(RL\)).

Equation 12: \[ \mu_y = \frac{\mu_{gF} + \mu_{gM0} + \mu_{gM1}}{F + M_0 + RL} \]

Using a range of female and male ages at reproduction and male ages of puberty, we can predict a parameter space for \(\mu_y\). What follows is Figure S4 from the paper.

colfunc <- colorRampPalette(c("#009292", "#920000"))

rt = c();nrt = c();
for(i in seq(5, 51, 1))
{
  for(j in seq(5, 51, 1))
  {
    rt = c(rt, i)
    nrt = c(nrt, j)
  }
}
mu_y = (((human_df * human_mu_c) + (human_dm0 * human_mu_c) + (human_dy1 * rt * human_mu_c)) / (rt + nrt));
# Calculate mu_y for a range of life history values

par(mar=c(5,5,4,2))
pred_data = interp(rt,nrt,mu_y)
filled.contour(pred_data, xlab='RL (years)', ylab=expression('F + M'[0]*' (years)'), cex.lab=2, col=colfunc(26),
               key.title = {par(cex.main=2);title(main=expression(mu[y]))},
               plot.axes = { 
                 contour(pred_data, drawlabels=T, axes=F, frame.plot=F, add=T, col="#999999", lwd=1.5, lty=3, labcex=1);
                 axis(1);
                 axis(2);
                 
                 points(16.3,43.1,pch=21,bg='#db6d00',col="white",cex=2)
                 # Kong human point
                 points(5.64,7.53,pch=21,bg='#b66dff',col="white",cex=2)
                 # Our owl monkey point
                 points(16.8,33.8,pch=21,bg='#920000',col="white",cex=2)
                 # Venn chimp point
               })

human_mu_y = (((human_df * human_mu_c) + (human_dm0 * human_mu_c) + (human_dy1 * 16.3 * human_mu_c)) / (29.7 + 13.4 + 16.3));
# Human mutation rate per year (Equation 12)

om_mu_y = (((human_df * human_mu_c) + (human_dm0 * human_mu_c) + (human_dy1 * 5.64 * human_mu_c)) / (6.53 + 1 + 5.64));
# Owl monkey mutation rate per year (Equation 12)

chimp_mu_y = (((human_df * human_mu_c) + (human_dm0 * human_mu_c) + (human_dy1 * 16.8 * human_mu_c)) / (26.3 + 7.5 + 16.8));
# Chimp mutation rate per year (Equation 12)

Again, using the underlying mutational parameters predicted from humans, we can predict \(\mu_y\) for humans (orange dot), owl monkeys (purple dot), and chimpanzees (red dot) by varying the ages of reproduction and puberty in these species. What jumps out is that \(\mu_y\) is not solely dependent on \(RL\), but also on the duration of the other life periods. Increasing the age of concesption for females (\(F\)) tends to decrease \(\mu_y\). Increasing the age of puberty in males increases the duration of \(M_0\) and also tends to decrease \(\mu_y\), though not as drastically. However, increasing \(RL\) by either decreasing age of puberty or increasing age of reproduction in males has more complicated effects. At low \(F + M_0\), increasing \(RL\) tends to decrease \(\mu_y\), while at high \(F + M_0\) increaseing \(RL\) tends to increase \(\mu_y\).


Appendix: Table of parameter definitions

in_data = read.csv("data/owl-monkey-filters.csv", header=TRUE)
cur_filter = subset(in_data, Min.DP==20 & Max.DP==60 & Min.AB==0.4 & Max.AB==0.6)
# Read data and define current filter

symbols = c(
  "$\\mu_g$",
  "$\\alpha$", 
  "$C$", 
  "$m_g$", 
  "$A_M$", 
  "$P_M$",
  "$RL$",
  "$\\mu_{gF}$",
  "$d_F$",
  "$\\mu_{gM0}$",
  "$d_{M0}$",
  "$d_{yM1}$",
  "$\\mu_c$",
  "$t_{sc}$",
  "$\\mu_{yM1}$",
  "$p_{sc}$",
  "$\\mu_y$"
  )

defs = c(
  "Mutation rate per site per generation", 
  "False negative rate", 
  "Callable sites", 
  "# of mutations per generation", 
  "Age of male at birth of offspring", 
  "Age of male at puberty (years)", 
  "Reproductive longevity",
  "Mutation rate per site per generation in females",
  "Number of cell divisions in females",
  "Mutation rate per site per generation in males before puberty",
  "Number of cell divisions in males before puberty",
  "Number of cell divisions per year in males after puberty",
  "Mutation rate per site per cell division",
  "Spermatogenic cycle length (days)",
  "Mutation rate per site per year in males after puberty",
  "Proportion of spermatagonial cells actively dividing",
  "Mutation rate per site per year"
  )

ests = c(
  paste("Owl monkey: ", signif(mean(mu_calc), digits=3)), 
  paste("Owl monkey with allelic balance cut-offs [0.4,0.6]: ", signif(cur_filter$Min.FN[1], digits=3)), 
  paste("Owl monkey average: ", floor(mean(callable_sites))), 
  paste("Owl monkey average: ", floor(mean(cur_filter$MVs))), 
  "NA", 
  "Owl monkey: 1 ^[^ [@dixson80]^]^, Human: 13.4 ^[^ [@nielsen86]^]^, Chimpanzees: 7.5 ^[^ [@marson91]^]^", 
  "NA",
  paste("Human: ", signif(human_num_f/human_haploid, digits=3), "^[^ [@kong12]^]^"),
  paste("Human: ", human_df, "^[^ [@drost95]^]^"),
  paste("Human: ", signif(human_mu_c * human_dm0, digits=3)),
  paste("Human: ", human_dm0, "^[^ [@drost95]^]^"),
  paste("Owl monkey: ", signif(om_dy1, digits=3), ", Human: ", signif(human_dy1, digits=3), ", Chimpanzee: ", signif(chimp_dy1, digits=3)),
  signif(human_mu_c, digits=3),
  paste("Owl monkey: ", om_sc, "^[^ [@derooij86]^]^, Human: ", human_sc, "^[^ [@heller63]^]^, Chimpanzee: ", chimp_sc, "^[^ [@smithwick96]^]^"),
  paste("Owl monkey: ", signif(coef(om_pred_reg)[2], digits=3)),
  paste("Human: ", signif(human_sp, digits=3)),
  paste("Owl monkey: ", signif(om_mu_y, digits=3), ", Human: ", signif(human_mu_y, digits=3), ", Chimpanzee: ", signif(chimp_mu_y, digits=3))
  )

param_table = data.frame(symbols, defs, ests)
names(param_table) = c("Symbol", "Definition", "Estimate")
# Re-calculate the mutation rate for posterity and subset the data for the table.


kable(param_table, "html", caption="Table 2: Parameter symbols, definitions, and estimates", digits=11) %>%
  kable_styling(bootstrap_options="striped", full_width=F)
Table 2: Parameter symbols, definitions, and estimates
Symbol Definition Estimate
\(\mu_g\) Mutation rate per site per generation Owl monkey: 8.14e-09
\(\alpha\) False negative rate Owl monkey with allelic balance cut-offs [0.4,0.6]: 0.438
\(C\) Callable sites Owl monkey average: 2207614768
\(m_g\) # of mutations per generation Owl monkey average: 20
\(A_M\) Age of male at birth of offspring NA
\(P_M\) Age of male at puberty (years) Owl monkey: 1 [19], Human: 13.4 [22], Chimpanzees: 7.5 [21]
\(RL\) Reproductive longevity NA
\(\mu_{gF}\) Mutation rate per site per generation in females Human: 5.4e-09 [1]
\(d_F\) Number of cell divisions in females Human: 31 [14]
\(\mu_{gM0}\) Mutation rate per site per generation in males before puberty Human: 5.92e-09
\(d_{M0}\) Number of cell divisions in males before puberty Human: 34 [14]
\(d_{yM1}\) Number of cell divisions per year in males after puberty Owl monkey: 6.88 , Human: 4.39 , Chimpanzee: 5.01
\(\mu_c\) Mutation rate per site per cell division 1.74e-10
\(t_{sc}\) Spermatogenic cycle length (days) Owl monkey: 10.2 [18], Human: 16 [15], Chimpanzee: 14 [24]
\(\mu_{yM1}\) Mutation rate per site per year in males after puberty Owl monkey: 5.99e-10
\(p_{sc}\) Proportion of spermatagonial cells actively dividing Human: 0.192
\(\mu_y\) Mutation rate per site per year Owl monkey: 1.19e-09 , Human: 4e-10 , Chimpanzee: 4.77e-10

References

1. Kong, A. et al. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488, 471–5 (2012). doi:10.1038/nature11396

2. Michaelson, J. J. et al. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 151, 1431–42 (2012). doi:10.1016/j.cell.2012.11.019

3. Besenbacher, S. et al. Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios. Nature Communications 6, 5969 (2015). doi:10.1038/ncomms6969

4. Francioli, L. C. et al. Genome-wide patterns and properties of de novo mutations in humans. Nature Genetics 47, 822–826 (2015). doi:10.1038/ng.3292

5. Besenbacher, S. et al. Multi-nucleotide de novo mutations in humans. PLoS Genetics 12, e1006315 (2016). doi:10.1371/journal.pgen.1006315

6. Girard, S. L. et al. Paternal age explains a major portion of de novo germline mutation rate variability in healthy individuals. PLoS One 11, e0164212 (2016). doi:10.1371/journal.pone.0164212

7. Goldmann, J. M. et al. Parent-of-origin-specific signatures of de novo mutations. Nature Genetics 48, 935–9 (2016). doi:10.1038/ng.3597

8. Rahbari, R. et al. Timing, rates and spectra of human germline mutation. Nature Genetics 48, 126–133 (2016). doi:10.1038/ng.3469

9. Wong, W. S. et al. New observations on maternal age effect on germline de novo mutations. Nature Communications 7, 10486 (2016). doi:10.1038/ncomms10486

10. Jonsson, H. et al. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 549, 519–522 (2017). doi:10.1038/nature24018

11. Segurel, L., Wyman, M. J. & Przeworski, M. Determinants of mutation rate variation in the human germline. Annual Reviews in Genomics and Human Genetics 15, 47–70 (2014). doi:10.1146/annurev-genom-031714-125740

12. Thomas, G. W. C. & Hahn, M. W. The human mutation rate is increasing, even as it slows. Molecular Biology and Evolution 31, 253–257 (2014). doi:10.1093/molbev/mst218

13. Amster, G. & Sella, G. Life history effects on the molecular clock of autosomes and sex chromosomes. Proceedings of the National Academy of Sciences of the United States of America 113, 1588–1593 (2016). doi:10.1073/pnas.1515798113

14. Drost, J. B. & Lee, W. R. Biological basis of germline mutation: comparisons of spontaneous germline mutation-rates among Drosophila, mouse, and human. Environmental and Molecular Mutagenesis 25, 48–64 (1995). doi:DOI 10.1002/em.2850250609

15. Heller, C. G. & Clermont, Y. Spermatogenesis in man: an estimate of its duration. Science 140, 184–186 (1963). doi:DOI 10.1126/science.140.3563.184

16. Plant, T. M. Undifferentiated primate spermatogonia and their endocrine control. Trends in Endocrinology and Metabolism 21, 488–495 (2010). doi:10.1016/j.tem.2010.03.001

17. Scally, A. Mutation rates and the evolution of germline structure. Philosophical Transactions of the Royal Society B: Biological Sciences 371, (2016). doi:ARTN 20150137 10.1098/rstb.2015.0137

18. Derooij, D. G., Vanalphen, M. M. A. & Vandekant, H. J. G. Duration of the cycle of the seminiferous epithelium and its stages in the rhesus-monkey (Macaca mulatta). Biology of Reproduction 35, 587–591 (1986). doi:DOI 10.1095/biolreprod35.3.587

19. Dixson, A. F., Gardner, J. S. & Bonney, R. C. Puberty in the male owl monkey (Aotus trivirgatus griseimembra): A study of physical and hormonal development. International Journal of Primatology 1, 129–139 (1980). doi:10.1007/BF02735593

20. Venn, O. et al. Strong male bias drives germline mutation in chimpanzees. Science 344, 1272–5 (2014). doi:10.1126/science.344.6189.1272

21. Marson, J., Meuris, S., Cooper, R. W. & Jouannet, P. Puberty in the male chimpanzee: progressive maturation of semen characteristics. Biology of Reproduction 44, 448–455 (1991). doi:DOI 10.1095/biolreprod44.3.448

22. Nielsen, C. T. et al. Onset of the release of spermatozoa (spermarche) in boys in relation to age, testicular growth, pubic hair, and height. Journal of Clinical Endocrinology & Metabolism 62, 532–535 (1986). doi:DOI 10.1210/jcem-62-3-532

23. Tatsumoto, S. et al. Direct estimation of de novo mutation rates in a chimpanzee parent-offspring trio by ultra-deep whole genome sequencing. Scientific Reports 7, 13561 (2017). doi:10.1038/s41598-017-13919-7

24. Smithwick, E. B., Young, L. G. & Gould, K. G. Duration of spermatogenesis and relative frequency of each stage in the seminiferous epithelial cycle of the chimpanzee. Tissue & Cell 28, 357–366 (1996). doi:Doi 10.1016/S0040-8166(96)80022-X