Summary of Machine Learning with Fuchs Data
Explanation of Fuchs Model
I have previously explained much of the Fuchs Model here. The inputs of this model are as follows:
- Laser Intensity (W/cm2): around $10^{17} \rightarrow 10^{19}$
- Focal Distance ($\mu m$): around $-30 \rightarrow 30$
- Target Thickness ($\mu m$): around $0.5 \rightarrow 10$
There are some other inputs that were kept as constants
- Spot Size: 2 $\mu m$
- Pulse Duration (FWHM): 40 fs
and another important parameter would be the total laser energy which can be computed from the laser intensity, spot size, and pulse duration also found here.
The Total Proton Energy and Number of Protons would be related to the formula for dN/dE in the Fuchs Paper, and Tom’s code has the implementation of how to get those things. The Paper also takes the energy in protons divided by the energy of the laser to obtain the laser conversion efficiency which is something that we’ll find in the datasets as a useful quantity, but is not a parameter in the model.
The outputs that we are trying to predict are
- Total Proton Energy
- Max Proton Energy
- Average Proton Energy
Where the average proton energy could be obtained from the Fuchs Model by taking the total proton energy and dividing by the number of protons.
Data Preprocessing
With the dataset, we specify x% of the data to be training data nd (100-x)% to be testing. Tom and Pedro varied this from x = 25 to x = 95.
Then, we use a PyTorch DataLoader to organize the data in batches and take care of shuffling. Below, I will explain some of the Deep Learning terminology with an example dataset with 1,000 points.
- Batch Size: By default, we would set the batch size to be the number of training points. This would mean that we would train our neural network over the whole training set and update the model hyper parameters at the end of this one pass. However, with a large number of data points, this may take a long time. Instead, we can specify the batch size to be 100 so that we would train the model with a smaller amount of data of 100 points. But, we would still use all 1000 points by updating the model parameters after each batch of 100 so that we do 10 total iterations. In general, a large batch size is better, but a smaller batch size would run faster.
- Epochs: In general, one pass through the dataset is not enough for our neural network. We can think of this like Newton’s Method for finding the minimum of a function (which is essentially what Neural Networks do for the Gradient-Descent Optimization). After only one iteration of Newton’s Method, we are closer to the answer, but we should expect to do at least several iterations to converge to some minimum. In the case of the neural network, the function we are minimizing is the mean squared error (due to specifying the “MSELoss” Loss Function, but there are other things we can minimize). In general, more epochs are desired, but after a certain point we will have converged and have no need to continue the training.
Another thing to mention is the scaling of the data. The Intensities are very large and it does not make sense to use them as is. It would make more sense to take the logarithm of the intensity for training purposes, and then later exponentiate them to recover an intensity. Additionally, we might want to scale the values so that they are all approximately the same order of magnitude by subtracting by a constant and multiplying by a constant. For example, we can convert some data point $x \rightarrow (x - \bar{x})/\sigma$ to make the values normally distributed. Or, we can choose the scale in such a way that the maximum is 1 and the minimum is 0. This sort of scaling is done later in the forward pass with batch normalization.
Model Training
The first step would be to apply the scaling. Then we will do the following for each epoch.
- Zero the gradients which are used to inform how we change the neural network hyperparameters
- Forward Pass: Evaluate the current model on the inputs
- Compute the Loss using the Loss Function from the model outputs and the true outputs (usually MSE)
- Backward Pass: Calculate gradients based on the Loss.
- Using the gradients, calculate the new hyper-parameters (stepping forward with the optimizer). We can think of this as like a linear approximation by taking $f(x) = f(x_0) + f’(x) \Delta x$ where $f’(x)$ is the “gradient”, $f(x_0)$ is the old hyper-parameter value and $\Delta x$ is related to the learning rate which will tell us how far we should jump at each iteration.
- Repeat steps 2-5 for each batch. After finishing all the batches, we have completed one epoch and we start again at step 1 for the next epoch.
Forward Pass
In the forward pass, we will apply a sequence of transformations to the input data.
- First, we apply the Batch Normalization to input data with dimension $n_x$.
- Next, we take a set of weights and multiply them by the input values to obtain a new set of values with dimension $n_1$. This weight matrix will have dimension $(n_1, n_x)$.
- Then, we apply an activation function to these new values which gives us the $n_1$ values in the first hidden layer. Optionally, we can apply another BatchNorm before the activation function
- Repeat this proces for each additional hidden layer (weighting function, batch norm, activation function) where the successive layers will have $n_2$, $n_3$ … features.
- At the end, do a final weighted sum to go back from $n_N$ features to $n_x$ output features.