Noise in GP, changed SVR params, Using Git
Testing Different GP Models
I trained a GP Model on 100, 1000, and 10000 point datasets (80% training 20% testing) and stored the models to a file. Then, I loaded in those models and used them to predict all the values of the 1000 point dataset. I also just plotted the untrained GP model prediction as well. Below, I have the results, where I’ve plotted the predicted max proton energy from the model in comparison to the exact max proton energy from the noiseless fuchs model (magenta, dashed) vs the thickness.
The light blue shaded region is the confidence region from the gaussian process model (2 $\sigma$ above and below where $\sigma$ is the standard deviation or noise at each point). We can see that the untrained model has a very low confidence, but all the rest of the models have very high confidence and produce good agreement with the fuchs model. Labeled on the graphs are the Noisy Error (percentage error between noisy fuchs model and GP model) and Exact Error (percentage error between noiseless fuchs model and GP model). We see that using fewer data points actually lowers the error with the noiseless fuchs model. This might be because if we are using too many data points, we are just overfitting to the noise, when in reality we want to focus on the underlying true noiseless model.
I took the average value of the noise $\sigma$ and reported it on the graphs as well. We can see that $\sigma$ is close to the value of RMSE (root mean square error) which indicates that the noise level in the gaussian process model can be used as an estimate for the RMSE with the noisy fuchs data. Additionally, I converted this to a percent error estimate by dividing $\sigma$ by the max proton energy prediction from the model, taking the average at each point, and multiplying by $\sqrt{\frac{2}{\pi}}$ for the conversion from gaussian error to MAPE.
Finding better SVR params
I had thought that the SVR was not running very efficiently, since the Gaussian Process was doing much better when I didn’t have a reason to suspect it should do much better. I changed some of the SVR hyperparameters as follows:
- Kernel: Poly $\rightarrow$ RBF
- Tolerance: 0.001 $\rightarrow$ 0.01
- Gamma: auto $\rightarrow$ scale
- C: 10 $\rightarrow$ 1
The change to the RBF kernel made a large difference, as well as lowering the tolerance for the stopping criterion. I just changed Gamma and C back to their default values that the rapids SVR algorithm uses. In doing so, I found the SVR to train the Fuchs Data very fast: 100,000 training points in less than 10 seconds (9.96s).
with almost non-existent percent error
This easily clears the goal of training 100,000 training points in less than 100 seconds.
How to use Git on OSC
To seamlessly transition from working on OSC and working on my personal computer, I spent some time looking up how to use Git. First, on OSC, I create an SSH key by typing in
ssh-keygen -t ed25519 -C username@email.com
and replace the email with your own preferred email associated with your github account and I chose not to enter a password. Then, I typed in
eval “$(ssh-agent -s)”
to get the the following output: $\texttt{Agent pid #####}$
We should now have a folder created (in the location $\texttt{~/.ssh/}$). In this folder, we want to create a file called $\texttt{config}$ using
vim ~/.ssh/config
and in this file, we type in
$\texttt{Host *}$
$\quad \texttt{AddKeysToAgent yes}$
$\quad \texttt{IdentityFile ~/.ssh/id_ed25519}$
and then type the command
ssh-add ~/.ssh/id_ed25519.pub
Then, on GitHub, go to Settings $\rightarrow$ SSH and GPG Keys $\rightarrow$ New SSH Key. The title can be anything. In the Key Textbox, copy and paste in the contents of $\texttt{id_ed25519.pub}$. Then, on the OSC terminal, type
ssh -T git@github.com
and say yes to the prompt. At this point, we should be set up with SSH. To clone a GitHub repo, we go to a GitHub repository page and click Code $\rightarrow$ SSH and copy to clipboard. then we type in the OSC terminal
git clone clipboard
where clipboard is replaced with what we just copied. By default, we will have just copied the master branch. To download another branch, we use the $\texttt{checkout}$ command
git checkout branch-name
or if the branch is not recognized, we can specify
git checkout -b branch-name origin/branch-name
which will switch to a different branch and the $\texttt{branch}$ command
git branch
will list the available branches to choose from that are currently downloaded on the local (on OSC) repository. After making some changes, we can see all the changes by typing in
git status
and if we want to commit these changes, first we add them by using
git add -A
for all files, or instead of $\texttt{-A}$ we can just specifiy files. Then, we would type in
git commit -m “Message”
where we put a description of our changes for Message. Finally, we need to push these changes to GitHub by typing in
git push -u origin branch_name
If you get an error about $\texttt{gnome-ssh-askpass}$, just type in $\texttt{unset SSH_ASKPASS}$. Then, we can check the place where we will push the changes by typing in
git remote -v
if the output starts with $\texttt{https}$ we are pushing with HTTPS and we will need an authentication token. However, we want to push with SSH. To change this, we can delete the current origin alias using $\texttt{git remote rm origin}$ and add a new origin with $\texttt{git remote add origin git@github.com:ronak-n-desai/fuchs-nn.git}$. Then we type in the following command to push changes
git push -u origin branch_name
As long as everything is safely stored on GitHub, I can delete local repositories and simply redownload them by just typing in the $\texttt{git clone}$ command referenced earlier. When making changes in git, it is best practice to create a new branch first by typing in
git checkout -b branch_name
Then, after making whatever changes we need, we add/commit them and move back to the master branch with $\texttt{git checkout master}$. After this, we merge the branches with
git merge branch_name
and can then delete this branch that is no longer needed with
git branch -d branch_name
An additional thing to note about the private key ($\texttt{id_ed25519}$) and public key ($\texttt{id_ed25519.pub}$) are their permissions. I can see them in the following snapshot
The permission are r (read), w (write), x (execute) and can be specified for the user, group, and everyone which is why there are a total of 9 permissions to change. We can see that the private key only has read and write permission by the user, so it should not be visible to anyone else on OSC.
Things to Do
- Experiment with amount of Noise in NN Model
- Change Log Scaling to be less ad hoc (Log and then StandardScaler)