Exercise 7¶
Warning
Please note that we provide assignment feedback only for students enrolled in the course at the University of Helsinki.
Start your assignment
Pandas
You can start working on your copy of Exercise 7 (Pandas version) by accepting the GitHub Classroom assignment
Exercise 7 is due by 16:00 on 28.10.
You can also take a look at the open course copy of Exercise 7 (Pandas version) in the course GitHub repository (does not require logging in). Note that you should not try to make changes to this copy of the exercise, but rather only to the copy available via GitHub Classroom.
Numpy
You can start working on your copy of Exercise 7 (NumPy version) by accepting the GitHub Classroom assignment
Exercise 7 is due by 16:00 on 28.10.
You can also take a look at the open course copy of Exercise 7 (NumPy version) in the course GitHub repository (does not require logging in). Note that you should not try to make changes to this copy of the exercise, but rather only to the copy available via GitHub Classroom.
Hints for Exercise 7¶
Labels and legends¶
In the plot for Problem 3 you’re asked to include a line legend for each subplot. To do this, you need to do two things:
- You need to add a
label
value when you create the plot using theplt.plot()
function. This is as easy as adding a parameter that saylabel='some text'
when you callplt.plot()
. - You’ll need to display the line legend, which can be done by calling
plt.legend()
for each subplot.
Using enumerate()
¶
In case the enumerate()
function is causing you some confusion, here is a simple example.
The general idea is that enumerate()
will return both the value in a list and its index when you use it.
Let’s see if this helps…
In [1]: animals=['dog', 'cat', 'frog']
In [2]: for index, animal in enumerate(animals):
...: print(animal, 'is in location', index)
...:
dog is in location 0
cat is in location 1
frog is in location 2
Saving multiple plots into a directory¶
In Problems 3 and 4 the aim is to create 65 individual plots, and save those into your computer.
In these kind of situations, the smartest thing to do is to use a for
loop and at the end of each
loop, save the image into a folder that you have specified. There are some useful tricks related to saving
files and generating good file names automatically.
A good approach when saving multiple files into a folder, is to define a separate variable where you store
only the directory path. Then during every loop you combine this directory path, and the file name together.
This can be done by using a function os.path.join()
which is part of os
built-in Python module.
Consider following example:
In [3]: import os
In [4]: myfolder = r"C:\MyUserName\Temp_visualizations"
In [5]: for i in range(5):
...: filename = "My_File_" + str(i) + ".png"
...: filepath = os.path.join(myfolder, filename)
...: print(filepath)
...:
C:\MyUserName\Temp_visualizations/My_File_0.png
C:\MyUserName\Temp_visualizations/My_File_1.png
C:\MyUserName\Temp_visualizations/My_File_2.png
C:\MyUserName\Temp_visualizations/My_File_3.png
C:\MyUserName\Temp_visualizations/My_File_4.png
Here, we created a folder path and a unique filename, and in the end parsed a full filepath that could be used to save a plot into that location on your computer.
Preventing plot display¶
When creating the series of images needed for the animation in Problem 5, you may be stuck with many plots being displayed in JupyterLab.
You can suppress the display of plots by calling plt.close()
after the plt.savefig(...)
command.
In other words, you can do
...
plt.savefig(...)
plt.close()
...
which will close the plot before it would normally be displayed.
Creating an animation from multiple images¶
In Problems 3 and 4 the aim was to plot multiple images on a predefined folder. An optional task
was to create an animation out of those figures. Animating the figures in Problems 3 and 4 is fairly
straightforward task to do in Python. All you need to do is to install a module called imageio
and
run couple lines of code that I show below.
But, first you need to install imageio
module.
Installing the module can be done by running following command from the command prompt / terminal with admin rights:
$ conda install -c conda-forge imageio
Note
If everything works fine you should not see any errors coming into the screen. If you receive an error, the most typical one is that you did not have admin rights when trying to install the module. In such case, you should open command prompt with admin rights (Command prompt –> right click –> Run as administrator..)
When you have imageio installed you should be able to import it, in Spyder:
In [6]: import imageio
Creating the animation¶
Following commands should produce a nice gif-animation out of your plots. The idea is that you list all the
files from the folder where you saved the plots using glob
function, and then pass that file list into imageio
function called imageio.mimsave()
. A following example shows how to do that.
First we list all the files from folder that has .png
file format using glob
. The *
wildcard character tells to computer that
the name of the file can be anything (the purpose of the star). .png
after the star tells that the filename should end with .png
characters.
If there are some other files with other file format than .png, they will be excluded.
Finally, we create the animation into the computer.
import glob
import imageio
# Find all files from given folder that has .png file-format
search_criteria = r"C:\MyUserName\Temp_visualizations\*.png"
# Execute the glob function that returns a list of filepaths
figure_paths = glob.glob(search_criteria)
# Save the animation to disk with 48 ms durations
output_gif_path = r"C:\MyUserName\Temp_animation.gif"
imageio.mimsave(output_gif_path, [imageio.imread(fp) for fp in figure_paths], duration=0.48, subrectangles=True)
With these lines of code you should be able to create a nice animation out of your plots!
NumPy-specific hints¶
Extracting seasonal dates and temperatures (in many years)¶
One of the tasks this week is to split many years of temperature anomaly data into seasonal groups (arrays in our case).
While it is possible to use the values in the date_monthly
array to do this, your life may be easier if you simply use only the months of the seasons to split the data into separate seasonal arrays.
You can do this using masks, and although it is not totally correct, you can feel free to split your data into the following season month ranges (all within a given year).
Season | Months |
---|---|
Winter | 12, 1, 2 |
Spring | 3-5 |
Summer | 6-8 |
Fall | 9-11 |
The main point here is that although the winter of 1953 would normally include December 1952, January of 1953, and February of 1953, you can feel free to use the anomalies from January, February, and December of 1953. Of course, you’re welcome to try to figure out how to do this the “right” way, but it is more challenging :).
Finding seasonal average temperatures (by year)¶
When averaging the seasonal temperatures, we can take advantage of knowing how many years of seasonal values we will have (i.e., the number of unique years in our dataset).
You can use this to create some arrays (of zeros, for example) to store the seasonal average values.
Once you have those arrays, you can use a for
loop to go over each year and store the average anomaly values for each season.
An example of this kind of loop is below.
index = 0
for year in unique_years:
winter_yearly[index] = anomaly_season[year_season.astype(int) == year].mean()
index += 1
The idea here is that you can easily loop over each year, check the condition that the year of the data slice equals the year in the loop, extract that slice from the anomaly data, and calculate the mean.
There are other ways you could do this same loop, but here we use index
to store place the seasonal average values in the correct location in each array.