Updated Optimization Plots, WPAFB Experiment

Fuchs Optimization

Proton Energy Conversion

Since last time, I created some better “banana” plots to showcase the simple optimization task we are trying to accomplish. The optimization function I am using is:

\begin{equation} f = \frac{\lvert KE_\text{MAX} - KE_\text{cutoff} \rvert}{KE_\text{cutoff}} + \frac{C}{\eta_p} + \text{PENALTY} \label{eq:objective} \end{equation}

where $C = {4.5 \cdot 10^{-3}, 1.8 \cdot 10^{-3}, 0.8 \cdot 10^{-3}}$ changes based on the cutoff energy $KE_\text{cutoff} = {1, 0.5, 0.25}$ MeV.

Here is what the optimization task looks like on the analytical fuchs data

fuchs_banana

The gray lines enclose the region that doesn’t have a penalty added to the objective function. The red line is the locus of points that match the exact energy cutoff (excluding the second term in eq \ref{eq:objective}). The green region prefers a higher laser to proton energy conversion efficiency (excluding the first term in eq \ref{eq:objective}). The blue region approximately equally weights the proton energy conversion efficiency with the criterion of being close to the energy cutoff (includes all terms with $C$ being the value that makes the first two terms equally important). The blue and green stars shows the optimal points (lowest objective function value) and all the green and blue points are within 5 $\%$ of this cutoff.

Below, is another plot that shows the performance of SVR, GPR, NN trained on 2000 points with 30$\%$ added gaussian noise. I used 100,000 points evaluated on these models to generate these banana plots

noise=30

As expected, the NN doesn’t perform very well. Interestingly, we can see that the SVR and GPR predict lower target thicknesses than it should, this may be in part the log-scaling that makes these models under-predict the energy spectrum parameters.

Back to Number of Protons

I’ve redone this optimization problem using $N_p$ instead of $\eta_p$ where $N_p$ is the number of protons. Accordingly, I adjusted $C = {9 \cdot 10^{9}, 5.4 \cdot 10^9, 3.6 \cdot 10^9}$. Additionally, I trained using 20,000 points on the SVR and 20,000 points on the NN. The GPR ran into memory issues, so I only used 4,000 points.

fuchs_banana

Here, we can see the optimal parameters don’t hug the left gray curve quite as much. Here are the results with 30 $\%$ added noise for the ML models

noise=30

Confusingly, the SVR seems to do worse with more data. I need to look into what is going on here.

Another look at noise model and scaling

One thing I realized was our log-scaling skews the predictions of a ML model when there is a lot of noise. For example, I trained a polynomial regression (degree 5) on Fuchs Data and reported the mean absolute percentage error for varying levels of noise

image

Here, in magenta, I have the error calculated exactly from assuming a gaussian distribution, finding the expected value of log(x) in that distribution, and exponentiating that to compare with just the mean of the original distribution. I did the similar thing with the cyan line for the log gaussian distribution (which is what I trained the model on). It seems like this is a big limitation of the log scaling.

Here is the result using a 1 degree (linear) polynomial

image

And with a degree 3 polynomial

image

However, as a comparison, I did the same analysis without the log scaling and it looks like the polynomial regeression performs extremely poorly (for the degree 5 polynomial)

image

so, it seems, at least for the polynomial regression that a logarithmic scaling on the energies is necessary. Specifically with the log-gaussian distribution, this under-predicting bias is directly related to the median vs the mean of the log-gaussian distribution with differ by a multiplicative factor of $\sqrt{1+\alpha^2}$ where $\alpha$ is the noise level fraction.

Doing Grid Searches

I have been working to update the Neural Network Model and other Models by doing the grid searches more formally in a way that can be included with the code published on Zenodo. I have switched to Jack’s skorch implementation which can use scikit-learn’s GridSearchCV functionality. For the SVR, I built my own 5-fold-CV and GridSearch which took a few hours to implement. I have made plots of the NN grid search MSE with different model architectures which should be better plots to replace what we had in the appendix of the paper.

To Do

  • Make sure I am training all models correctly and doing the optimization correctly
  • Look into the Invertible Neural Network Model.
  • Update Grid Searches for all methods
Written on February 16, 2024