Exercise 5

Warning

Please note that we provide assignment feedback only for students enrolled in the course at the University of Helsinki.

Start your assignment

You can start working on your copy of Exercise 5 by accepting the GitHub Classroom assignment.

Exercise 5 is due by the start of lecture in week 6.

You can also take a look at the open course copy of Exercise 5 in the course GitHub repository (does not require logging in). Note that you should not try to make changes to this copy of the exercise, but rather only to the copy available via GitHub Classroom.

Exercise 5 hints for Pandas

Below are some tips for working on Exercise 5.

Selecting date ranges

In the Problem 4 part 2, the aim is to select rows that belong to certain month. The key here is to understand that the data values in YR--MODAHRMN column are integer numbers using a format YYYYMMDDHHmm where YYYY is the year of the observation, MM is the month, DD is the day, HH is the hour, and mm is the minute.

Using these values it is possible to make simple mathematical queries such as finding the values starting from August:

august_values = data.loc[data['YR--MODAHRMN'] >= 201708010000]

Here, the value 201708010000 corresponds to the first day of August at 00:00 hour.

Exercise 5 hints for NumPy

Formatting output to written to files

You can specify the format of the saved data using the fmt parameter with np.savetxt(). Let’s consider an example. We have two columns in a NumPy array called data that we would like to output to a file called test.csv. The first column contains integer values, the second are floating point values that we would like to round to 4 decimal places. We could create a comma-separated data file as follows:

np.savetxt('test.csv', data, delimiter=',', fmt='%i, %.4f')

In this case, the fmt parameter should contain two values separated by a comma, one for each output format. The % sign indicates a variable for the output (one of the columns), i indicates an integer value, and .4f indicates a floating point value with 4 numbers after the decimal point.

You can find additional data about formatting output at https://pyformat.info. NumPy, for example, uses the “old” formatting style mentioned on that site.