One of the problems with self-teaching code as you go along is that you then often miss out on tricks (at least for me). I started out coding with MATLAB during my Undergraduate and moved over to Python once I started my PhD, and so have very much learnt on the job with coding on a need to know basis. However, as my code is needing to handle more and more data, I am frequently faced with having to go back and optimise in order to make it faster. And so, my new favourite things to improve my code are list comprehensions in place of loops.
List comprehensions are a way of transforming any list (or anything iterable) into another list, in a much faster way than a for loop can.
First, a simple example. Imagine that you want to know what the result of multiplying each number from 0 to 9, by 2. This could be done with a loop as follows:
numbers= for x in range(10): numbers.append(x*2) print numbers > [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
And this works fine – it appends each x*2 term to the numbers array from the range function, which goes between 0 and 9. However, if you were to scale this up to do more than a simple multiplication, your computer may start to struggle.
This is where we can use list comprehension. This above loop can be easily translated:
numbers = [x * 2 for x in range(10)] print numbers > [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
What this does is it takes the first statement within the square brackets (x*2 in our case), and goes through the range function to find what our value of x is, and applies the squared term, giving the same result for the numbers array.
From using the list comprehensions first very simply, it is then possible to expand them for more complex functions, as they can also handle nested loops. Thus you can greatly optimise the speed of your code by replacing any suitable loops with them. Initially I was put off by them because I struggled to understand how to use them with my own code, and so I thought I would put up a blog in the off chance that anyone else could benefit from this too.
My greatest improvement was when I was creating my synthetic seismograms. To do this, I was creating noise, and then working out interevent times to add in my events. In order to merge the two, I needed to add the noise onto the event at the corresponding times (as otherwise the event would be literally the same each time and I was trying to make this as realistic as possible) and have the event at a magnitude determined by another variable.
Whenever I need to do something sequential, I always default to loops, as it seemed simple enough to go through each interevent time, get the relevant noise, and add the event (with a predetermined magnitude) to the noise portion to create a Stream of Traces. With 200 events, this step was taking me roughly 25 minutes! But now, with using list comprehension, I have sped up this function to only take 7 seconds. Great improvement!
I have included both my use of list comprehensions and loops to show how the code can change between the two. From taking the time to understand how they work, I have now been able to implement these into many more areas of my code. So other aspects like my cross-correlations are much faster because of this small change.
There are many great tutorials out there, and so you should look into these if you are interested in optimising your code (if it was full of loops like mine). I’m always discovering new ways in which I can improve my code, so I believe that the process of learning to become a better coder is very much ongoing for me!
Using list comprehensions : 7 seconds
# make the noise trace into a numpy array testnoise = np.array(noise_trace) # create a Stream of the events (with Gutenberg Richter magnitudes) # with the added noise st_events_poisson = Stream([Trace(testnoise[i:i+len(st_event)] + (j * np.array(st_event.data))) for i,j in zip(poisson_times.astype('int'), g)]) # loop through times to change the stats for i in range(0, len(poisson_times)): st_events_poisson[i].stats.starttime = st_events_poisson[i].stats.starttime + poisson_times[i] st_events_poisson[i].stats.sampling_rate = samp_rate st_events_poisson[i].stats.delta = delta
Using loops : 25 minutes
# loop through for each interevent time for i in range(0, len(poisson_times)): # make the noise trace into a numpy array testnoise = np.array(noise_trace) # find the noise portion for where the event occurs noise_portion = testnoise[(poisson_times[i]*int(samp_rate)): (poisson_times[i]*int(samp_rate)) + int(len(st_event))] # add the event (multiplied by a Gutenberg-Richter magnitude) # onto the noise portion noise_plus_event_arr = noise_portion + (np.array(st_event.data) * g[i]) # make this into a Trace and assign stats noise_plus_event = Trace(noise_plus_event_arr) noise_plus_event.stats.sampling_rate = samp_rate noise_plus_event.stats.delta = delta noise_plus_event.stats.starttime = noise_trace.stats.starttime + poisson_times[i] # create Stream of events st_events_poisson.append(noise_plus_event)