NaClhv

Theology, philosophy, science, math, and random other things
                                                                                                                                                                                                                                                                  

How to insert a Jupyter Notebook into your WordPress post

June 21, 2021

I do a lot of work in Jupyter Notebooks, and I often find it useful to post them in their entirety in a blog post. The following method is what I use. It requires no additional plugins or programs, and allows you to post the notebook as a non-interactive html element, as a Gutenberg block in your post.

Unfortunately, this doesn't work with all versions of Jupyter Notebooks. My personal setup is to use "Anaconda3-2019.10-Windows-x86_64.exe" from Anaconda.com, and the following instructions work for that version.

1. Open the the Jupyter Notebook. Then from its menu bar, choose "File", "Download as", then "HTML (.html)"

2. Open the downloaded .html file of the notebook in a text editor.

3. Find the "<html>" tag in the file (without quotes). Delete it and the lines above it, and replace it with "<div>"

4. Likewise, find the "</html>" tag in the file. Delete it and any lines below it, and replace it with "</div>"

5. Search for the following block of text:

/*!
*
* IPython base
*
*/

It should appear around line 9000. Delete it and the lines above it, up to and including:

/*!
*
* Twitter Bootstrap
*
*/

which should appear near the top of the file.

6. In the line:

inlineMath: [ ['$','$'], ["\\(","\\)"] ],

replace each "$" with "$_$". Do the same in any markdown cell in the notebook where "$" is used for math. The exact set of new characters can be anything, as long as it doesn't appear elsewhere on your post.

7. Look for the following block of text:

/* Overrides of notebook CSS for static HTML export */
body {
overflow: visible;
padding: 8px;
}

In it, delete the whole line that contains "padding: 8px;"

8. Start editing your WordPress post, and insert a "Custom HTML" block where you want the Jupyter Notebook to appear.

9. Take the text-file html version of the Jupyter Notebook you edited above, and copy-paste it into the custom HTML block. Then, click on the "options" icon in the block menu (three vertical dots), and choose "Convert to Blocks"

10. If you later get an error message saying "This block contains unexpected or invalid content", simply click on the "Attempt Block Recovery" button.

That's it! The notebook will be rendered in the post where you placed the block. An example appears below, using a notebook from my post on the "Bayesian evaluation for the likelihood of Christ's resurrection".

resurrection_calculation_official_version_3

First, we import some modules:

In [1]:
%matplotlib inline
  import numpy as np
  import pandas as pd
  from scipy.stats import genpareto, binom
  

Next, we write the function to simulating getting the maximum value out of n samples from a given distribution:

In [2]:
def max_out_of_n_from_dist(dist, out_of_n=1e9):
      manageable_n = 100000
      if out_of_n <= manageable_n:
          return dist.rvs(out_of_n).max()
      else:
          top_percentiles = \
              np.random.rand(manageable_n) * manageable_n / out_of_n
          return min(
              dist.isf(top_percentiles.min()), 
              np.finfo(float).max / 100) # prevent inf
  

Next, we consider generalized Pareto distributions with the shape parameters between 0.01 to 2.0, in increasing intervals of 0.01. That is, we consider shape parameters of 0.01, 0.02, 0.03 ... 2.0.

We then simulate getting the maximum value out of 1e9 samples drawn from these distributions, and rescale the distribution so that this maximum value is at x = 1. Then we calculate the probability density at x = 6. This is the likelihood of drawing a sample at 6 times the maximum value - that is to say, the likelihood of "naturally" generating a Jesus-level resurrection report from this distribution. Lastly, we next calculate how many "outliers" would exist in this distribution.

We repeat this 10000 times for each of the 200 shape parameters between 0.01 and 2.0, and put it all in a table. The result is a table with 2 000 000 rows, whose columns are the shape parameter, the likelihood of drawing a sample 6 times greater than the maximum, and the number of "outliers".

The following code gives us this results table.

In [3]:
sample_size = int(1e9)
  genpareto_shapes = np.linspace(0.01, 2.0, 200)
  shape_params = []
  pdf_6x_max = []
  n_outliers = []
  
  for shape_param in genpareto_shapes:
      dist = genpareto(shape_param, scale=1, loc=0)
      for i in range(10000):
          shape_params.append(shape_param)
          max_val = max_out_of_n_from_dist(dist, sample_size)
          scaled_dist = genpareto(
              shape_param, scale=1 / max_val, loc=0)
          pdf_6x_max.append(scaled_dist.pdf(6))
          # use binom instead of actually drawing for outliers
          p_outlier = (
              scaled_dist.sf(0.5) - scaled_dist.sf(1)
          ) / scaled_dist.cdf(1)
          n_outlier = \
              binom(sample_size - 1, p_outlier).rvs(1)[0] + 1
          n_outliers.append(n_outlier)
              
  genpareto_results_df = pd.DataFrame({
      "shape_params":shape_params, 
      "pdf_6x_max":pdf_6x_max,
      "n_outliers":n_outliers,
  })
  
  #save to .csv, as generating this takes a while
  genpareto_results_df.to_csv(
      "genpareto_results_df.csv", encoding="utf-8")
  

Let's load up the results and see the first few rows:

In [4]:
genpareto_results_df = pd.read_csv(
      "genpareto_results_df.csv", encoding="utf-8"
  ).drop("Unnamed: 0", 1)
  
In [5]:
print(genpareto_results_df.shape)
  genpareto_results_df.head()
  
(2000000, 3)
  
Out[5]:
shape_paramspdf_6x_maxn_outliers
00.011.400046e-3626287
10.011.650500e-3811942
20.014.315781e-3631962
30.012.588130e-3719289
40.013.339785e-398680

So, let's say that in reality, there are only 15 "outliers". Now, this does not narrow down the possibilities to a single shape parameter. Just due to chance, you can get 15 "outliers" from a shape parameter of 0.3, and also from a shape parameter of 0.6. However, the 15 "outliers" does narrow things down enough to give us a distribution over shape parameters. This is an improvement over our prior knowledge, which was that we had no idea what the shape parameter might be.

How could we get this posterior distribution of the shape parameters? All we need to do is to take the subset of the results table where the number of outliers is exactly 15, and look at the shape parameters. This satisfies Bayes' theorem, and the distribution of shape parameters in that subset IS the posterior distribution for the shape parameter.

Let's write a function to do this and see what it looks like:

In [6]:
def shape_hist(df, n_outliers):
      srs = df[df["n_outliers"] == n_outliers]["shape_params"]
      n_bins = (srs.max() - srs.min()) / 0.01 + 1
      srs.hist(bins=int(n_bins)).set_xlabel("shape parameters")
  
In [7]:
shape_hist(genpareto_results_df, 15)
  

So, our "skeptic's distribution" in this case is a linear combination of generalized Pareto distributions, with the above distribution of shape parameters, each one scaled individually depending on where its maximum value was found.

Continuing on with the same reasoning, the "skeptic's likelihood" of generating a Jesus-level resurrection report would simply be the density of that distribution at x = 6. Since we tracked this value in the above table, again all we have to do is take the subset of the table where the number of outliers is exactly 15, and average over this likelihood value.

Furthermore, the Bayes factor is simply the ratio of this value to the the likelihood from the "Christian's Distribution", which is just 0.5/12 for values up to x = 12.

As before, let's write a function to calculate these values, to see what they are with 15 outliers:

In [8]:
def likelihood_and_bf(df, n_outliers):
      skeptical_likelihood = df[
          df["n_outliers"] == n_outliers]["pdf_6x_max"].mean()
      christian_likelihood = (0.5 / 12)
      bf = christian_likelihood / skeptical_likelihood
      print("skeptic's likelihood: {:.2e}"
            .format(skeptical_likelihood))
      print("Bayes factor: {:.2e}".format(bf))
  
In [9]:
likelihood_and_bf(genpareto_results_df, 15)
  
skeptic's likelihood: 8.51e-12
  Bayes factor: 4.90e+09
  

Note that the above distribution of shape parameters decays to nearly nothing by the time we reach values around 1.0. So we don't have to worry about our artificial upper limit of 2.0. There is essentially no chance for the shape parameter to be that high, given 15 outliers.

Of course, there's almost certainly more than 15 outliers in world history. This further forces the shape parameters to smaller values. In addition, this distribution over shape parameters turns out to decay quite rapidly as it goes off to the right (demonstrating this is left as an exercise to for the reader). All this combines to show that the upper limit of 2.0 will have no bearing on our conclusions.

So, what if there are more outliers, like 60? That would make for the following posterior distribution over shape parameters:

In [10]:
shape_hist(genpareto_results_df, 60)
  

And the Bayes factor for a Jesus-level resurrection report would be:

In [11]:
likelihood_and_bf(genpareto_results_df, 60)
  
skeptic's likelihood: 3.86e-13
  Bayes factor: 1.08e+11
  

What if there are 250 outliers? Then the distribution over shape parameters looks like this:

In [12]:
shape_hist(genpareto_results_df, 250)
  

And the Bayes factor for a Jesus-level resurrection report would be:

In [13]:
likelihood_and_bf(genpareto_results_df, 250)
  
skeptic's likelihood: 1.44e-14
  Bayes factor: 2.90e+12
  

We can clearly see that the number of "outliers" controls the "skeptic's likelihood" of generating a Jesus-level resurrection report. Here is how the two quantities are related:

In [14]:
outliers_pdf_6x = genpareto_results_df[
      genpareto_results_df["n_outliers"] < 100
  ].groupby("n_outliers")["pdf_6x_max"].mean()
  outliers_pdf_6x.name = "pdf_6x_max"
  
  outliers_pdf_6x.reset_index().plot(
      kind="scatter", x="n_outliers", y="pdf_6x_max",
      xlim=(0,100), ylim=(0, 6e-11),
      title="skeptic's likelihood vs outliers"
  )
  
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x1b7391adda0>

And this is how that translates into Bayes factors for Jesus's resurrection:

In [15]:
outliers_log10_bf = np.log10((0.5 / 12) / outliers_pdf_6x)
  outliers_log10_bf.name = "log10_bf"
  
  outliers_log10_bf.reset_index().plot(
      kind="scatter", x="n_outliers", y="log10_bf",
      xlim=(0,100), ylim=(8, 14),
      title="Bayes factors vs outliers"
  )
  
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x1b737c4d358>

The abnormal values around n_outliers < 5 is due to the "shape parameter exceeding 2.0" problem mentioned earlier - because with so few outliers, it would be a problem. It quickly becomes a non-issue as the number of outliers increases.

Otherwise, looking at the graphs above, we see that the skeptic's likelihood of generating a Jesus-level resurrection report drops as the number of "outliers" increases, thereby increasing the Bayes factor. Having MORE non-Christian resurrection reports (that is, having more "outliers") makes the skeptic LESS able to explain Jesus's resurrection, and therefore makes it MORE likely - exactly as we said before.

Show/hide comments(No Comments)

Leave a Reply

Copyright