Change Log Scaling, NN Noise
Modifying Tom’s Log Scaling
In Tom’s Process class, he pre-processed the intensity and three output energies in the following way. First, he divided each data point value by some scale factor that is somewhat around the typical values in the dataset. As an example from his file $\texttt{Model_Tuning_OSC_35_Epochs_CSV.py}$, he used ${1e19, 1e0, 1e8, 1e-1}$ for the intensity, max energy, total energy, and average energy scales. Then he simply took a natural logarithm of the data. The other two input variables (thickness and focal distance) were left unchanged. To me, this seems a problem because it is not very standardized. The typical approach to preprocessing involves something like $\texttt{StandardScaler}$ which normalizes the data points by subtracting by a mean and dividing by a standard deviation. If we are going to log the dataset, we should do it before the standard scaler transformation is applied. I implemented some different versions of this scaling and I tested them. I trained a 10,000 point dataset (8,000 training, 2,000 testing) with GP and compared the percent error with each method.
First, I used Tom’s Scaling
Then, I took a logarithm of the intensity, and then applied standard scaler to all inputs/outputs
Finally, I just applied standard scaler to all inputs/outputs without taking any logarithms.
It appears that Tom’s way of scaling the data performs worse. As a result, I think just sticking to the standard scaler preprocessing should work fine for the most part. Now, I have implemented multiple different ways of preprocessing, so I can always change how I do it in the future.
Testing Data
In Tom’s Code, he only uses 10 percent of the testing data when checking error metrics
I just changed the code so that it uses the full testing dataset. This explains why sometimes I get lower accuracy when I have a high training percentage split. A low testing percentage split combined with only using 10% of the testing data means that we may not be using an adequate number of testing points in the error calculations.
NN Noise
The Gaussian Process has an inbuilt measure of the noise which is just the standard deviation of a confidence interval which is briefly outlined in the last post. This is measured at each data ponit and I just took the average among all data points to get a singular value for the noise. I tried comparing this to a measure of the noise in the neural network model. Effectively, I ran the NN model 30 times and computed the mean squared error between all the predictions and the mean of the 30 predictions for each point. Similarly, I (first take the square root) averaged all the values for each point in the dataset to get a singular value for the noise. Below, I have some graphs showing this
I have depicted the mean absolute percentage error in the cyan and magenta dashed lines and the percent error estimate as 7.98*(noise/10) in yellow. We can see that both measures for noise are not close at all for the GP and for the NN. The NN noise remains relatively constant even when I change the dataset to one that contains higher gaussian noise. Also, in doing this analysis, I found the Gaussian Process to not work very well for noise level at 30 percent. This could mean that I am not training the model for long enough or that I am not using optimal hyperparameters so that the regression is stopping early.
But on further examination, I changed the learning rate from $\texttt{1e-2}$ to $\texttt{1e-3}$ and the stopping tolerance from $\texttt{1e-3}$ to $\texttt{1e-4}$ to make the gaussian process regression run slower and longer to hopefully get better results. I still ended up getting the same peculiar inaccuracy for a gaussian noise level of 30 pct.
However, the Neural Network is able to match up to the percent error of expected error of around 24pct, while the GP is stuck at 37pct. Maybe this is a fundmanetal limitation of the Gaussian Process Model.
Upon further inspection, I just realized that I didn’t filter out the negative energy values from the dataset. I computed the mean squared error and mean absolute percentage error between the noisy and noiseless data for noise levels between 2 and 40% and found that at around 20-25% gaussian noise we start seeing much higher errors than expected.
This is because at this gaussian noise level, getting a negative energy reading would be a 4 or 5 $\sigma$ result which starts to become probable when we have a dataset of thousands of points. To mitigate this, I need to implement some cutoff that restricts the amount of added noise to at most a fraction of the exact fuchs energy value so that we never subtract more than the initial value to obtain the noisy value.
Things to Do
- Fix Dataset generation to avoid negative values
- Item 2