More on Lists

We have already seen lists and how they can be used. Now that you have some more background I will go into more detail about lists. First we will look at more ways to get at the elements in a list and then we will talk about copying them.

Here are some examples of using indexing to access a single element of an list:

>>> list = ['zero', 'one', 'two', 'three', 'four', 'five']
>>> list[0]
'zero'
>>> list[4]
'four'
>>> list[5]
'five'
All those examples should look familiar to you. If you want the first item in the list just look at index 0. The second item is index 1 and so on through the list. However, what if you want the last item in the list? One way could be to use the len function like list[len(list)-1]. This way works since the len function always returns the last index plus one. The second from the last would then be list[len(list)-2]. There is an easier way to do this. In Python the last item is always index -1. The second to the last is index -2 and so on. Here are some more examples:
>>> list[len(list)-1]
'five'
>>> list[len(list)-2]
'four'
>>> list[-1]
'five'
>>> list[-2]
'four'
>>> list[-6]
'zero'
Thus any item in the list can be indexed in two ways: from the front and from the back.

Another useful way to get into parts of lists is using slices. Here is another example to give you an idea what they can be used for:

>>> list = [0, 'Fred', 2, 'S.P.A.M.', 'Stocking', 42, "Jack", "Jill"]
>>> list[0]
0
>>> list[7]
'Jill'
>>> list[0:8]
[0, 'Fred', 2, 'S.P.A.M.', 'Stocking', 42, 'Jack', 'Jill']
>>> list[2:4]
[2, 'S.P.A.M.']
>>> list[4:7]
['Stocking', 42, 'Jack']
>>> list[1:5]
['Fred', 2, 'S.P.A.M.', 'Stocking']
Slices are used to return part of a list. The slice operator is in the form list[first_index:following_index]. The slice goes from the first_index to the index before the following_index. You can use both types of indexing:
>>> list[-4:-2]
['Stocking', 42]
>>> list[-4]
'Stocking'
>>> list[-4:6]
['Stocking', 42]
Another trick with slices is the unspecified index. If the first index is not specified the beginning of the list is assumed. If the following index is not specified the whole rest of the list is assumed. Here are some examples:
>>> list[:2]
[0, 'Fred']
>>> list[-2:]
['Jack', 'Jill']
>>> list[:3]
[0, 'Fred', 2]
>>> list[:-5]
[0, 'Fred', 2]
Here is a program example:
poem = ["<B>", "Jack", "and", "Jill", "</B>", "went", "up", "the", "hill",\
        "to", "<B>", "fetch", "a", "pail", "of", "</B>", "water.", "Jack",\
        "fell", "<B>", "down", "and", "broke", "</B>", "his", "crown", "and",\
        "<B>", "Jill", "came", "</B>", "tumbling", "after"]

def get_bolds(list):
    ## is_bold tells whether or not the we are currently looking at
    ## a bold section of text.
    is_bold = False
    ## start_block is the index of the start of either an unbolded
    ## segment of text or a bolded segment.
    start_block = 0
    for index in range(len(list)):
        ##Handle a starting of bold text
        if list[index] == "<B>":
            if is_bold:
                print("Error:  Extra Bold")
            ##print "Not Bold:", list[start_block:index]
            is_bold = True
            start_block = index+1
        ##Handle end of bold text
        ##Remember that the last number in a slice is the index
        ## after the last index used.
        if list[index] == "</B>":
            if not is_bold:
                print("Error: Extra Close Bold")
            print("Bold [", start_block, ":", index, "] ",\
            list[start_block:index])
            is_bold = False
            start_block = index+1

get_bolds(poem)
with the output being:
Bold [ 1 : 4 ]  ['Jack', 'and', 'Jill']
Bold [ 11 : 15 ]  ['fetch', 'a', 'pail', 'of']
Bold [ 20 : 23 ]  ['down', 'and', 'broke']
Bold [ 28 : 30 ]  ['Jill', 'came']

The get_bold function takes in a list that is broken into words and token's. The tokens that it looks for are <B> which starts the bold text and <\B> which ends bold text. The function get_bold goes through and searches for the start and end tokens.

The next feature of lists is copying them. If you try something simple like:

>>> a = [1, 2, 3]
>>> b = a
>>> print(b)
[1, 2, 3]
>>> b[1] = 10
>>> print(b)
[1, 10, 3]
>>> print(a)
[1, 10, 3]
This probably looks surprising since a modification to b resulted in a being changed as well. What happened is that the statement b = a makes b a reference to the same list that a is a reference to. This means that b and a are different names for the same list. Hence any modification to b changes a as well. However some assignments don't create two names for one list:
>>> a = [1, 2, 3]
>>> b = a*2
>>> print(a)
[1, 2, 3]
>>> print(b)
[1, 2, 3, 1, 2, 3]
>>> a[1] = 10
>>> print(a)
[1, 10, 3]
>>> print(b)
[1, 2, 3, 1, 2, 3]

In this case, b is not a reference to a since the expression a*2 creates a new list. Then the statement b = a*2 gives b a reference to a*2 rather than a reference to a. All assignment operations create a reference. When you pass a list as a argument to a function you create a reference as well. Most of the time you don't have to worry about creating references rather than copies. However when you need to make modifications to one list without changing another name of the list you have to make sure that you have actually created a copy.

There are several ways to make a copy of a list. The simplest that works most of the time is the slice operator since it always makes a new list even if it is a slice of a whole list:

>>> a = [1, 2, 3]
>>> b = a[:]
>>> b[1] = 10
>>> print(a)
[1, 2, 3]
>>> print(b)
[1, 10, 3]

Taking the slice [:] creates a new copy of the list. However it only copies the outer list. Any sublist inside is still a references to the sublist in the original list. Therefore, when the list contains lists the inner lists have to be copied as well. You could do that manually but Python already contains a module to do it. You use the deepcopy function of the copy module:

>>> import copy
>>> a = [[1, 2, 3], [4, 5, 6]]
>>> b = a[:]
>>> c = copy.deepcopy(a)
>>> b[0][1] = 10
>>> c[1][1] = 12
>>> print(a)
[[1, 10, 3], [4, 5, 6]]
>>> print(b)
[[1, 10, 3], [4, 5, 6]]
>>> print(c)
[[1, 2, 3], [4, 12, 6]]
First of all, notice that a is an array of arrays. Then notice that when b[0][1] = 10 is run both a and b are changed, but c is not. This happens because the inner arrays are still references when the slice operator is used. However, with deepcopy, c was fully copied.

So, should I worry about references every time I use a function or =? The good news is that you only have to worry about references when using dictionaries and lists. Numbers and strings create references when assigned but every operation on numbers and strings that modifies them creates a new copy so you can never modify them unexpectedly. You do have to think about references when you are modifying a list or a dictionary.

By now you are probably wondering why are references used at all? The basic reason is speed. It is much faster to make a reference to a thousand element list than to copy all the elements. The other reason is that it allows you to have a function to modify the inputed list or dictionary. Just remember about references if you ever have some weird problem with data being changed when it shouldn't be.