(2/26/21) A short primer on using Optuna and ECHO to interact with a sql database

1. We have access to a MariaDB located on thunder

MariaDB: MySQL relational database management system.

The MariaDB server is accessible from an NCAR IP address, but you cannot login to MariaDB as root remotely. To interact with the database as root, you would need to ssh to thunder and from there you will be able to login to MariaDB as root to setup/manage the database. This will not affect the interaction between optuna and the database, but we will need root in order to manage the database (future).

In this blog, we have a database named “optuna”. For demonstrating purposes, we imagine that a user “icarus” exists. If you are at NCAR and are experimenting with mysql + optuna, you may email John Schreck about obtaining access. Ordinarily, to get onto thunder, you will use your NCAR password (same as for casper, cheyenne, etc).

2. Optuna does not have much to say with regards to its sql support.

In general, this interaction is low-level, while your interaction with optuna is much higher. To that end, the simplest way to go about managing your studies is to use the create_study and delete_study methods.

You may continue to use the sqlite “storage”, but be warned that once 1000 trials are saved to the named study, the performance will degrade quickly. This is especially apparent when running the hyperparameter importance metrics, which query the database and train a tree model on the fly.

3. Example: Using create_study and delete_study

First, lets see what tables are in the “optuna” database on thunder (from terminal):

(I shared an ssh key, hence not having to use Duo. Details at the bottom of this tutorial)

Next, lets list the study names user “schreck” has saved into optuna:

Now we create a new study named “example”:

import optuna
study = optuna.create_study(
    study_name="example", 
    storage="mysql://icarus:password@thunder.ucar.edu/optuna"
)

Confirm that the study was actually created by repeating the command from earlier:

Next, in your hyperparameters.yml configuration file, we simply point to the database as follows under the optuna field:

study_name: "example"
storage: "mysql://icarus:password@thunder.ucar.edu/optuna"

You don’t have to worry about entering your sql password, it is already contained in the storage link! Since we are on an NCAR server, we also do not need to use Duo, although this will be changing in the near future. The forth-coming additional security will likely become problematic, but we will deal with that later.

Note that you don’t have to create a study beforehand if it does not exist, the optimize.py script that is used to launch a hyperparameter study, contained in the ECHO package, will call create_study for you:

For now, when its time to delete a study from our optuna database, simply call the optuna method delete_study:

optuna.delete_study(
    study_name="example", 
    storage="mysql://icarus:password@thunder.ucar.edu/optuna"
)

Let us double check that it was actaully removed:

Ordinarily, you set reload = 0 in your hyperparameters.yml file when starting a new study. If the study name already exists, optimize.py/run.py will fail with an error message (I will not delete or overwrite things automatically. That job is left up to you).

When using the sqlite database solution, you simply delete that file. For sql support, the script will still complain at you, but a new parser option has been added that will facilitate the delete_study call:

E.g. you run:

And the study_name will be deleted from the storage container. Note that its gone forever, so be extra careful that this is what you intended.

4. For more, checkout this tutorial on data warehousing.

Feel free to email me (John Schreck, schreck@ucar.edu) with any questions / mistakes / whatever!