Noise in GP, changed SVR params, Using Git

Testing Different GP Models

I trained a GP Model on 100, 1000, and 10000 point datasets (80% training 20% testing) and stored the models to a file. Then, I loaded in those models and used them to predict all the values of the 1000 point dataset. I also just plotted the untrained GP model prediction as well. Below, I have the results, where I’ve plotted the predicted max proton energy from the model in comparison to the exact max proton energy from the noiseless fuchs model (magenta, dashed) vs the thickness.

download

The light blue shaded region is the confidence region from the gaussian process model (2 $\sigma$ above and below where $\sigma$ is the standard deviation or noise at each point). We can see that the untrained model has a very low confidence, but all the rest of the models have very high confidence and produce good agreement with the fuchs model. Labeled on the graphs are the Noisy Error (percentage error between noisy fuchs model and GP model) and Exact Error (percentage error between noiseless fuchs model and GP model). We see that using fewer data points actually lowers the error with the noiseless fuchs model. This might be because if we are using too many data points, we are just overfitting to the noise, when in reality we want to focus on the underlying true noiseless model.

I took the average value of the noise $\sigma$ and reported it on the graphs as well. We can see that $\sigma$ is close to the value of RMSE (root mean square error) which indicates that the noise level in the gaussian process model can be used as an estimate for the RMSE with the noisy fuchs data. Additionally, I converted this to a percent error estimate by dividing $\sigma$ by the max proton energy prediction from the model, taking the average at each point, and multiplying by $\sqrt{\frac{2}{\pi}}$ for the conversion from gaussian error to MAPE.

Finding better SVR params

I had thought that the SVR was not running very efficiently, since the Gaussian Process was doing much better when I didn’t have a reason to suspect it should do much better. I changed some of the SVR hyperparameters as follows:

Kernel: Poly $\rightarrow$ RBF
Tolerance: 0.001 $\rightarrow$ 0.01
Gamma: auto $\rightarrow$ scale
C: 10 $\rightarrow$ 1

The change to the RBF kernel made a large difference, as well as lowering the tolerance for the stopping criterion. I just changed Gamma and C back to their default values that the rapids SVR algorithm uses. In doing so, I found the SVR to train the Fuchs Data very fast: 100,000 training points in less than 10 seconds (9.96s).

points_v_time

with almost non-existent percent error

pct_error

This easily clears the goal of training 100,000 training points in less than 100 seconds.

How to use Git on OSC

To seamlessly transition from working on OSC and working on my personal computer, I spent some time looking up how to use Git. First, on OSC, I create an SSH key by typing in

ssh-keygen -t ed25519 -C username@email.com

and replace the email with your own preferred email associated with your github account and I chose not to enter a password. Then, I typed in

eval “$(ssh-agent -s)”

to get the the following output: $\texttt{Agent pid #####}$

We should now have a folder created (in the location $\texttt{~/.ssh/}$). In this folder, we want to create a file called $\texttt{config}$ using

vim ~/.ssh/config

and in this file, we type in

$\texttt{Host *}$
$\quad \texttt{AddKeysToAgent yes}$
$\quad \texttt{IdentityFile ~/.ssh/id_ed25519}$

and then type the command

ssh-add ~/.ssh/id_ed25519.pub

Then, on GitHub, go to Settings $\rightarrow$ SSH and GPG Keys $\rightarrow$ New SSH Key. The title can be anything. In the Key Textbox, copy and paste in the contents of $\texttt{id_ed25519.pub}$. Then, on the OSC terminal, type

ssh -T git@github.com

and say yes to the prompt. At this point, we should be set up with SSH. To clone a GitHub repo, we go to a GitHub repository page and click Code $\rightarrow$ SSH and copy to clipboard. then we type in the OSC terminal

git clone clipboard

where clipboard is replaced with what we just copied. By default, we will have just copied the master branch. To download another branch, we use the $\texttt{checkout}$ command

git checkout branch-name

or if the branch is not recognized, we can specify

git checkout -b branch-name origin/branch-name

which will switch to a different branch and the $\texttt{branch}$ command

git branch

will list the available branches to choose from that are currently downloaded on the local (on OSC) repository. After making some changes, we can see all the changes by typing in

git status

and if we want to commit these changes, first we add them by using

git add -A

for all files, or instead of $\texttt{-A}$ we can just specifiy files. Then, we would type in

git commit -m “Message”

where we put a description of our changes for Message. Finally, we need to push these changes to GitHub by typing in

git push -u origin branch_name

If you get an error about $\texttt{gnome-ssh-askpass}$, just type in $\texttt{unset SSH_ASKPASS}$. Then, we can check the place where we will push the changes by typing in

git remote -v

if the output starts with $\texttt{https}$ we are pushing with HTTPS and we will need an authentication token. However, we want to push with SSH. To change this, we can delete the current origin alias using $\texttt{git remote rm origin}$ and add a new origin with $\texttt{git remote add origin git@github.com:ronak-n-desai/fuchs-nn.git}$. Then we type in the following command to push changes

git push -u origin branch_name

As long as everything is safely stored on GitHub, I can delete local repositories and simply redownload them by just typing in the $\texttt{git clone}$ command referenced earlier. When making changes in git, it is best practice to create a new branch first by typing in

git checkout -b branch_name

Then, after making whatever changes we need, we add/commit them and move back to the master branch with $\texttt{git checkout master}$. After this, we merge the branches with

git merge branch_name

and can then delete this branch that is no longer needed with

git branch -d branch_name

An additional thing to note about the private key ($\texttt{id_ed25519}$) and public key ($\texttt{id_ed25519.pub}$) are their permissions. I can see them in the following snapshot

The permission are r (read), w (write), x (execute) and can be specified for the user, group, and everyone which is why there are a total of 9 permissions to change. We can see that the private key only has read and write permission by the user, so it should not be visible to anyone else on OSC.

Things to Do

Experiment with amount of Noise in NN Model
Change Log Scaling to be less ad hoc (Log and then StandardScaler)

Written on May 5, 2023