Basic elements of Python¶
In this lesson we will revisit data types, learn how data can be stored in Python lists, and about the concept of objects in programming.
Sources¶
Like the previous lesson, this lesson is inspired by the Programming with Python lessons from the Software Carpentry organization.
Note
There are some Python cells in this notebook that already contain code. You just need to press Shift-Enter to run those cells. We’re trying to avoid having you race to keep up typing in basic things for the lesson so you can focus on the main points :D.
Data types revisited¶
Let’s start with some data¶
We saw a bit about variables and their values in the lesson last week, and we continue today with some variables related to FMI observation stations in Finland. For each station, a number of pieces of information are given, including the name of the station, an FMI station ID number (FMISID), its latitude, its longitude, and the station type. We can store this information and some additional information for a given station in Python as follows:
[1]:
station_name = 'Helsinki Kaivopuisto'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/python/3.8.0/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
131
132 def __call__(self, source, filename, symbol):
--> 133 codeob = compile(source, filename, symbol, self.flags, 1)
134 for feature in _features:
135 if codeob.co_flags & feature.compiler_flag:
TypeError: required field "type_ignores" missing from Module
[2]:
station_id = 132310
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/python/3.8.0/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
131
132 def __call__(self, source, filename, symbol):
--> 133 codeob = compile(source, filename, symbol, self.flags, 1)
134 for feature in _features:
135 if codeob.co_flags & feature.compiler_flag:
TypeError: required field "type_ignores" missing from Module
[3]:
station_lat = 60.15
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/python/3.8.0/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
131
132 def __call__(self, source, filename, symbol):
--> 133 codeob = compile(source, filename, symbol, self.flags, 1)
134 for feature in _features:
135 if codeob.co_flags & feature.compiler_flag:
TypeError: required field "type_ignores" missing from Module
[4]:
station_lon = 24.96
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/python/3.8.0/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
131
132 def __call__(self, source, filename, symbol):
--> 133 codeob = compile(source, filename, symbol, self.flags, 1)
134 for feature in _features:
135 if codeob.co_flags & feature.compiler_flag:
TypeError: required field "type_ignores" missing from Module
[5]:
station_type = 'Mareographs'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/python/3.8.0/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
131
132 def __call__(self, source, filename, symbol):
--> 133 codeob = compile(source, filename, symbol, self.flags, 1)
134 for feature in _features:
135 if codeob.co_flags & feature.compiler_flag:
TypeError: required field "type_ignores" missing from Module
Here we have 5 values assigned to variables related to a single observation station. Each variable has a unique name and they can store different types of data.
Reminder: Data types and their compatibility¶
We can explore the different types of data stored in variables using the type()
function. Let’s use the cells below to check the data types of the variables station_name
, station_id
, and station_lat
.
[6]:
type(station_name)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-6-4c1396006091> in <module>()
----> 1 type(station_name)
NameError: name 'station_name' is not defined
[7]:
type(station_id)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-7-8355f7b4ce58> in <module>()
----> 1 type(station_id)
NameError: name 'station_id' is not defined
[8]:
type(station_lat)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-8-0a6b52b367a3> in <module>()
----> 1 type(station_lat)
NameError: name 'station_lat' is not defined
As expected, we see that the station_name
is a character string, the station_id
is an integer, and the station_lat
is a floating point number.
Note
Remember, the data types are important because some are not compatible with one another. What happens when you try to add the variables station_name
and station_id
in the cell below?
[9]:
station_name + station_id
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-9-351383cf58e2> in <module>()
----> 1 station_name + station_id
NameError: name 'station_name' is not defined
Here we get a TypeError
because Python does not know to combine a string of characters (station_name
) with an integer value (station_id
).
Converting data from one type to another¶
It is not the case that things like the station_name
and station_id
cannot be combined at all, but in order to combine a character string with a number we need to perform a data type conversion to make them compatible. Let’s convert station_id
to a character string using the str()
function. We can store the converted variable as station_id_str
.
[10]:
station_id_str = str(station_id)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/python/3.8.0/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
131
132 def __call__(self, source, filename, symbol):
--> 133 codeob = compile(source, filename, symbol, self.flags, 1)
134 for feature in _features:
135 if codeob.co_flags & feature.compiler_flag:
TypeError: required field "type_ignores" missing from Module
We can confirm the type has changed by checking the type of station_id_str
, or by checking the output when you type the name of the variable into a cell and running it.
[11]:
type(station_id_str)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-11-f9564e668242> in <module>()
----> 1 type(station_id_str)
NameError: name 'station_id_str' is not defined
[12]:
station_id_str
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-12-c048821e656f> in <module>()
----> 1 station_id_str
NameError: name 'station_id_str' is not defined
As you can see, str()
converts a numerical value into a character string with the same numbers as before.
Note
Similar to using str()
to convert numbers to character strings, int()
can be used to convert strings or floating point numbers to integers and float()
can be used to convert strings or integers to floating point numbers.
Attention
Poll pause - Questions 2.2, 2.3
Please visit the class polling page to participate (only for those present during the lecture time).
Combining text and numbers¶
Although most mathematical operations operate on numerical values, a common way to combine character strings is using the addition operator +
. Let’s create a text string in the variable station_name_and_id
that is the combination of the station_name
and station_id
variables. Once we define station_name_and_id
, we can print it to the screen to see the result.
[13]:
station_name_and_id = station_name + ": " + str(station_id)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/python/3.8.0/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
131
132 def __call__(self, source, filename, symbol):
--> 133 codeob = compile(source, filename, symbol, self.flags, 1)
134 for feature in _features:
135 if codeob.co_flags & feature.compiler_flag:
TypeError: required field "type_ignores" missing from Module
[14]:
print(station_name_and_id)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-14-4ff9f7bbe066> in <module>()
----> 1 print(station_name_and_id)
NameError: name 'station_name_and_id' is not defined
Note that here we are converting station_id
to a character string using the str()
function within the assignment to the variable station_name_and_id
. Alternatively, we could have simply added station_name
and station_id_str
.
Lists and indices¶
Above we have seen a bit of data related to one of several FMI observation stations in the Helsinki area. Rather than having individual variables for each of those stations, we can store many related values in a collection. The simplest type of collection in Python is a list.
Creating a list¶
Let’s first create a list of selected station_name
values and print it to the screen.
[15]:
station_names = ['Helsinki Harmaja', 'Helsinki Kaisaniemi', 'Helsinki Kaivopuisto', 'Helsinki Kumpula']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/python/3.8.0/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
131
132 def __call__(self, source, filename, symbol):
--> 133 codeob = compile(source, filename, symbol, self.flags, 1)
134 for feature in _features:
135 if codeob.co_flags & feature.compiler_flag:
TypeError: required field "type_ignores" missing from Module
[16]:
print(station_names)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-16-f21408601ba9> in <module>()
----> 1 print(station_names)
NameError: name 'station_names' is not defined
We can also check the type of the station_names
list using the type()
function.
[17]:
type(station_names)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-17-14d3437709ab> in <module>()
----> 1 type(station_names)
NameError: name 'station_names' is not defined
Here we have a list of 4 station_name
values in a list called station_names
. As you can see, the type()
function recognizes this as a list. Lists can be created using the square brackets [
and ]
, with commas separating the values in the list.
Index values¶
To access an individual value in the list we need to use an index value. An index value is a number that refers to a given position in the list. Let’s check out the first value in our list as an example by printing out station_names[1]
:
[18]:
print(station_names[1])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-18-58457c71e256> in <module>()
----> 1 print(station_names[1])
NameError: name 'station_names' is not defined
Wait, what? This is the second value in the list we’ve created, what is wrong? As it turns out, Python (and many other programming languages) start values stored in collections with the index value 0
. Thus, to get the value for the first item in the list, we must use index 0
. Let’s print out the value at index 0
of station_names
below.
[19]:
print(station_names[0])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-19-b254558d67b7> in <module>()
----> 1 print(station_names[0])
NameError: name 'station_names' is not defined
OK, that makes sense, but it may take some getting used to…
A useful analog - Bill the vending machine¶
As it turns out, index values are extremely useful, common in many programming languages, yet often a point of confusion for new programmers. Thus, we need to have a trick for remembering what an index value is and how they are used. For this, we need to be introduced to Bill.
Bill, the vending machine.
As you can see, Bill is a vending machine that contains 6 items. Like Python lists, the list of items available from Bill starts at 0 and increases in increments of 1.
The way Bill works is that you insert your money, then select the location of the item you wish to receive. In an analogy to Python, we could say Bill is simply a list of food items and the buttons you push to get them are the index values. For example, if you would like to buy a taco from Bill, you would push button 3
. An equivalent operation in Python could simply be
print(Bill[3])
Taco
Number of items in a list¶
We can find the length of a list using the len()
function. Use it below to check the length of the station_names
list.
[20]:
len(station_names)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-20-6b5339f31fc8> in <module>()
----> 1 len(station_names)
NameError: name 'station_names' is not defined
Just as expected, there are 4 values in our list and len(station_names)
returns a value of 4
.
Index value tips¶
If we know the length of the list, we can now use it to find the value of the last item in the list, right? What happens if you print the value from the station_names
list at index 4
, the value of the length of the list?
[21]:
print(station_names[4])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-21-dc7905f18eab> in <module>()
----> 1 print(station_names[4])
NameError: name 'station_names' is not defined
An IndexError
? That’s right, since our list starts with index 0
and has 4 values, the index of the last item in the list is len(station_names) - 1
. That isn’t ideal, but fortunately there’s a nice trick in Python to find the last item in a list. Let’s first print the station_names
list to remind us of the values that are in it.
[22]:
print(station_names)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-22-f21408601ba9> in <module>()
----> 1 print(station_names)
NameError: name 'station_names' is not defined
To find the value at the end of the list, we can print the value at index -1
. To go further up the list in reverse, we can simply use larger negative numbers, such as index -4
. Let’s print out the values at these indices below.
[23]:
print(station_names[-1])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-23-bafd61f7a75d> in <module>()
----> 1 print(station_names[-1])
NameError: name 'station_names' is not defined
[24]:
print(station_names[-4])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-24-ed6a8f8f921f> in <module>()
----> 1 print(station_names[-4])
NameError: name 'station_names' is not defined
Yes, in Python you can go backwards through lists by using negative index values. Index -1
gives the last value in the list and index -len(station_names)
would give the first. Of course, you still need to keep the index values within their ranges. What happens if you check the value at index -5
?
[25]:
print(station_names[-5])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-25-65170e33400a> in <module>()
----> 1 print(station_names[-5])
NameError: name 'station_names' is not defined
Attention
Poll pause - Question 2.4
Please visit the class polling page to participate (only for those present during the lecture time).
Modifying list values¶
Another nice feature of lists is that they are mutable, meaning that the values in a list that has been defined can be modified. Consider a list of the observation station types corresponding to the station names in the station_names
list.
[26]:
station_types = ['Weather stations', 'Weather stations', 'Weather stations', 'Weather stations']
print(station_types)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/python/3.8.0/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
131
132 def __call__(self, source, filename, symbol):
--> 133 codeob = compile(source, filename, symbol, self.flags, 1)
134 for feature in _features:
135 if codeob.co_flags & feature.compiler_flag:
TypeError: required field "type_ignores" missing from Module
Let’s change the value for station_types[2]
to be 'Mareographs'
and print out the station_types
list again.
[27]:
station_types[2] = 'Mareographs'
print(station_types)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/python/3.8.0/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
131
132 def __call__(self, source, filename, symbol):
--> 133 codeob = compile(source, filename, symbol, self.flags, 1)
134 for feature in _features:
135 if codeob.co_flags & feature.compiler_flag:
TypeError: required field "type_ignores" missing from Module
Data types in lists¶
Lists can also store more than one type of data. Let’s consider that in addition to having a list of each station name, FMISID, latitude, etc. we would like to have a list of all of the values for station ‘Helsinki Kaivopuisto’.
[28]:
station_hel_kaivo = [station_name, station_id, station_lat, station_lon, station_type]
print(station_hel_kaivo)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/python/3.8.0/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
131
132 def __call__(self, source, filename, symbol):
--> 133 codeob = compile(source, filename, symbol, self.flags, 1)
134 for feature in _features:
135 if codeob.co_flags & feature.compiler_flag:
TypeError: required field "type_ignores" missing from Module
Here we have one list with 3 different types of data in it. We can confirm this using the type()
function. Let’s check the type of station_hel_kaivo
, then the types of the values at indices 0-2
in the cells below.
[29]:
type(station_hel_kaivo)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-29-df2ce48a0f8a> in <module>()
----> 1 type(station_hel_kaivo)
NameError: name 'station_hel_kaivo' is not defined
[30]:
type(station_hel_kaivo[0]) # The station name
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-30-16c8ad434e69> in <module>()
----> 1 type(station_hel_kaivo[0]) # The station name
NameError: name 'station_hel_kaivo' is not defined
[31]:
type(station_hel_kaivo[1]) # The FMISID
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-31-3cb2a45513a1> in <module>()
----> 1 type(station_hel_kaivo[1]) # The FMISID
NameError: name 'station_hel_kaivo' is not defined
[32]:
type(station_hel_kaivo[2]) # The station latitude
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-32-e8eb2e004f6b> in <module>()
----> 1 type(station_hel_kaivo[2]) # The station latitude
NameError: name 'station_hel_kaivo' is not defined
Adding and removing values from lists¶
Finally, we can add and remove values from lists to change their lengths. Let’s consider that we no longer want to include the first value in the station_names
list. Since we haven’t see that list in a bit, let’s first print it to the screen.
[33]:
print(station_names)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-33-f21408601ba9> in <module>()
----> 1 print(station_names)
NameError: name 'station_names' is not defined
del
allows values in lists to be removed. It can also be used to delete values from memory in Python. To remove the first value from the station_names
list, we can simply type del station_names[0]
. If you then print out the station_names
list, you should see the first value has been removed.
[34]:
del station_names[0]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/python/3.8.0/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
131
132 def __call__(self, source, filename, symbol):
--> 133 codeob = compile(source, filename, symbol, self.flags, 1)
134 for feature in _features:
135 if codeob.co_flags & feature.compiler_flag:
TypeError: required field "type_ignores" missing from Module
[35]:
print(station_names)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-35-f21408601ba9> in <module>()
----> 1 print(station_names)
NameError: name 'station_names' is not defined
If we would instead like to add a few samples to the station_names
list, we can type station_names.append('List item to add')
, where 'List item to add'
would be the text that would be added to the list in this example. Let’s add two values to our list in the cells below: 'Helsinki lighthouse'
and 'Helsinki Malmi airfield'
. After doing this, let’s check the list contents by printing to the screen.
[36]:
station_names.append('Helsinki lighthouse')
station_names.append('Helsinki Malmi airfield')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/python/3.8.0/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
131
132 def __call__(self, source, filename, symbol):
--> 133 codeob = compile(source, filename, symbol, self.flags, 1)
134 for feature in _features:
135 if codeob.co_flags & feature.compiler_flag:
TypeError: required field "type_ignores" missing from Module
[37]:
print(station_names)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-37-f21408601ba9> in <module>()
----> 1 print(station_names)
NameError: name 'station_names' is not defined
As you can see, we add values one at a time using station_names.append()
. list.append()
is called a method in Python, which is a function that works for a given data type (a list in this case). We’ll see some other examples of useful list mtehods below.
Appending to an integer? Not so fast…¶
Let’s consider our list station_names
. As we know, we already have data in the list station_names
, and we can modify that data using built-in methods such as station_names.append()
. In this case, the method append()
is something that exists for lists, but not for other data types. It is intuitive that you might like to add (or append) things to a list, but perhaps it does not make sense to append to other data types. Below, let’s create a variable station_name_length
that we
can use to store the length of the list station_names
. We can then print the value of station_name_length
to confirm the length is correct.
[38]:
station_name_length = len(station_names)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/python/3.8.0/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
131
132 def __call__(self, source, filename, symbol):
--> 133 codeob = compile(source, filename, symbol, self.flags, 1)
134 for feature in _features:
135 if codeob.co_flags & feature.compiler_flag:
TypeError: required field "type_ignores" missing from Module
[39]:
print(station_name_length)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-39-a35a70982deb> in <module>()
----> 1 print(station_name_length)
NameError: name 'station_name_length' is not defined
If we check the data type of station_name_length
, we can see it is an integer value, as expected (do that below). What happens if you try to append the value 1
to station_name_length
?
[40]:
type(station_name_length)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-40-a07989ac2b67> in <module>()
----> 1 type(station_name_length)
NameError: name 'station_name_length' is not defined
[41]:
station_name_length.append(1)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-41-69d63b22a4c2> in <module>()
----> 1 station_name_length.append(1)
NameError: name 'station_name_length' is not defined
Here we get an AttributeError
because there is no method built in to the int
data type to append to int
data. While append()
makes sense for list
data, it is not sensible for int
data, which is the reason no such method exists for int
data.
Some other useful list methods¶
With lists we can do a number of useful things, such as count the number of times a value occurs in a list or where it occurs. The list.count()
method can be used to find the number of instances of an item in a list. For instance, we can check to see how many times 'Helsinki Kumpula'
occurs in our list station_names
by typing station_names.count('Helsinki Kumpula')
.
[42]:
station_names.count('Helsinki Kumpula') # The count method counts the number of occurences of a value
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-42-1615752723fe> in <module>()
----> 1 station_names.count('Helsinki Kumpula') # The count method counts the number of occurences of a value
NameError: name 'station_names' is not defined
Similarly, we can use the list.index()
method to find the index value of a given item in a list. Let’s use the cell below to find the index of 'Helsinki Kumpula'
in the station_names
list.
[43]:
station_names.index('Helsinki Kumpula') # The index method gives the index value of an item in a list
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-43-03b162969e5c> in <module>()
----> 1 station_names.index('Helsinki Kumpula') # The index method gives the index value of an item in a list
NameError: name 'station_names' is not defined
The good news here is that our selected station name is only in the list once. Should we need to modify it for some reason, we also now know where it is in the list (index 2
).
Reversing a list¶
There are two other common methods for lists that we need to see. First, there is the list.reverse()
method, used to reverse the order of items in a list. Let’s reverse our station_names
list below and then print the results.
[44]:
station_names.reverse()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-44-77b07a31ed89> in <module>()
----> 1 station_names.reverse()
NameError: name 'station_names' is not defined
[45]:
print(station_names)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-45-f21408601ba9> in <module>()
----> 1 print(station_names)
NameError: name 'station_names' is not defined
Yay, it works!
Caution
A common mistake when sorting lists is to do something like station_names = station_names.reverse()
. Do not do this! When reversing lists with .reverse()
the None
value is returned (this is why there is no screen ouput when running station_names.reverse()
). If you then assign the output of station_names.reverse()
to station_names
you will reverse the list, but then overwrite its contents with the returned value None
. This means you’ve deleted the contents of your
list (!).
Sorting a list¶
The list.sort()
method works the same way. Let’s sort our station_names
list and print its contents below.
[46]:
station_names.sort() # Notice no output here...
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-46-1e36234683be> in <module>()
----> 1 station_names.sort() # Notice no output here...
NameError: name 'station_names' is not defined
[47]:
print(station_names)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-47-f21408601ba9> in <module>()
----> 1 print(station_names)
NameError: name 'station_names' is not defined
As you can see, the list has been sorted alphabetically using the list.sort()
method, but there is no screen output when this occurs. Again, if you were to assign that output to station_names
the list would get sorted, but the contents would then be assigned None
.
Note
As you may have noticed, Helsinki Malmi airfield
comes before Helsinki lighthouse
in the sorted list. This is because alphabetical sorting in Python places capital letters before lowercase letters.