List comprehensions are a way of transforming any list (or anything iterable) into another list, in a much faster way than a for loop can.

First, a simple example. Imagine that you want to know what the result of multiplying each number from 0 to 9, by 2. This could be done with a loop as follows:

**Loop**

numbers=[] for x in range(10): numbers.append(x*2) print numbers > [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

And this works fine – it appends each x*2 term to the numbers array from the range function, which goes between 0 and 9. However, if you were to scale this up to do more than a simple multiplication, your computer may start to struggle.

This is where we can use list comprehension. This above loop can be easily translated:

**List comprehension**

numbers = [x * 2 for x in range(10)] print numbers > [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

What this does is it takes the first statement within the square brackets (x*2 in our case), and goes through the range function to find what our value of x is, and applies the squared term, giving the same result for the numbers array.

From using the list comprehensions first very simply, it is then possible to expand them for more complex functions, as they can also handle nested loops. Thus you can greatly optimise the speed of your code by replacing any suitable loops with them. Initially I was put off by them because I struggled to understand how to use them with my own code, and so I thought I would put up a blog in the off chance that anyone else could benefit from this too.

My greatest improvement was when I was creating my synthetic seismograms. To do this, I was creating noise, and then working out interevent times to add in my events. In order to merge the two, I needed to add the noise onto the event at the corresponding times (as otherwise the event would be literally the same each time and I was trying to make this as realistic as possible) and have the event at a magnitude determined by another variable.

Whenever I need to do something sequential, I always default to loops, as it seemed simple enough to go through each interevent time, get the relevant noise, and add the event (with a predetermined magnitude) to the noise portion to create a Stream of Traces. With 200 events, this step was taking me roughly **25 minutes!** But now, with using list comprehension, I have sped up this function to only take **7 seconds.** Great improvement!

I have included both my use of list comprehensions and loops to show how the code can change between the two. From taking the time to understand how they work, I have now been able to implement these into many more areas of my code. So other aspects like my cross-correlations are much faster because of this small change.

There are many great tutorials out there, and so you should look into these if you are interested in optimising your code (if it was full of loops like mine). I’m always discovering new ways in which I can improve my code, so I believe that the process of learning to become a better coder is very much ongoing for me!

— Roseanne

# make the noise trace into a numpy array testnoise = np.array(noise_trace) # create a Stream of the events (with Gutenberg Richter magnitudes) # with the added noise st_events_poisson = Stream([Trace(testnoise[i:i+len(st_event)] + (j * np.array(st_event.data))) for i,j in zip(poisson_times.astype('int'), g)]) # loop through times to change the stats for i in range(0, len(poisson_times)): st_events_poisson[i].stats.starttime = st_events_poisson[i].stats.starttime + poisson_times[i] st_events_poisson[i].stats.sampling_rate = samp_rate st_events_poisson[i].stats.delta = delta

# loop through for each interevent time for i in range(0, len(poisson_times)): # make the noise trace into a numpy array testnoise = np.array(noise_trace) # find the noise portion for where the event occurs noise_portion = testnoise[(poisson_times[i]*int(samp_rate)): (poisson_times[i]*int(samp_rate)) + int(len(st_event))] # add the event (multiplied by a Gutenberg-Richter magnitude) # onto the noise portion noise_plus_event_arr = noise_portion + (np.array(st_event.data) * g[i]) # make this into a Trace and assign stats noise_plus_event = Trace(noise_plus_event_arr) noise_plus_event.stats.sampling_rate = samp_rate noise_plus_event.stats.delta = delta noise_plus_event.stats.starttime = noise_trace.stats.starttime + poisson_times[i] # create Stream of events st_events_poisson.append(noise_plus_event)

Advertisements

But first, some background. Mount St. Helens is a volcano which is a part of the Cascade Range, located in southwestern Washington State, USA. The Cascade mountain range extends all the way from British Columbia to California, housing many volcanoes alongside the mountains. The reason for this is that it is a part of the Ring of Fire: an area encompassing the Pacific Ocean which is where the majority of all the earthquakes and volcanic eruptions in the world happen.

Mount St. Helens is still an active volcano to this day, with several recorded major explosive eruptions and many smaller eruptions in its history. 1980-1986 was one of these periods where the volcano exhibited eruptive activity, experiencing increased seismicity and explosive activity, resulting in 57 deaths.

In 1989-2001, Mount St. Helens again had periods of increased seismicity as a result of hydrothermal gas explosions. After this, it returned to a state of rest until 2004, when it was reawakened.

From 2004-2008, Mount St Helens exhibited increased seismicity again. This was unlike the previous awake periods as it didn’t actually have that many explosive events (only two! The 1980-1986 period had 17 lava dome-building episodes and hundreds of small gas and steam explosions). The other interesting quality of this reawakened period was the type of seismicity that was occurring. Small regularly-spaced earthquakes were repeatedly occurring during the eruptions. They are nicknamed “drumbeats” due to their resemblance of the sound pattern that is produced from the beating of a drum.

A days worth of seismicity during this period can be seen below, whereby each horizontal line represents 90 minutes.

We can even zoom into this and look at a 4 hour block (each horizontal line this time is only 30 minutes).

The repetitiveness of these small earthquakes is very clear to see in these images. This led scientists to wonder, *what is causing these drumbeats?*

Theory one (Iverson at al., 2006; Iverson, 2008; Anderson et al., 2010)

The drumbeats were due to a stick-slip motion of a piece of hardened magma (a conduit plug) being forced up through the vent which carries the magma from the magma chamber to the surface (the conduit) by ascending magma. The forcing of the plug up through the conduit causes it to interact with the sides. This happening repeatedly could then be what is causing the drumbeats.

Theory two (Waite et al., 2008)

The volcano is essentially acting like a steam engine. This would be due to there being a complicated crack system (think like those plumber games where you want to connect up all the pipes for the flow of water to begin) and a steady supply of heat and fluid from the magma chamber. This would then also cause the drumbeats to occur, similar to a train choo-chooing.

Some great analysis has been done on the similarity of these seismic signals (see References), as *if the drumbeats are similar, it means that they have come from effectively the same source.* This is where methods such as my correlation matrix become handy, as this measures how well correlated events are with one another. With this analysis, we can then see which events are true repeating events.

Mount St Helens is a great case study for building up any algorithm that focuses on finding any sort of pattern in seismic data, which is why I have been looking into it. This can then go towards our analysis for repeating events in earthquakes, although I doubt we will ever get as clean a signal at these drumbeats!

–Roseanne

**References**

Anderson, K., Lisowski, M., and Segall, P. (2010). Cyclic ground tilt associated with the 2004-2008 eruption of Mount St. Helens. Journal of Geophysical Research: Solid Earth, 115(11):1–29.

Iverson, R. M. (2008). Dynamics of Seismogenic Volcanic Extrusion Resisted by a Solid Surface Plug , Mount St . Helens , 2004 2005. In Sherrod, D., Scott, W., and Stauffer, P., editors, A Volcano Rekindled: The Renewed Eruption of Mount St. Helens 2004-2006, U.S. Geological Survey Professional Paper 1750, chapter 21, pages 425–460. USGS.

Iverson, R. M., Dzurisin, D., Gardner, C. A., Gerlach, T. M., LaHusen, R. G., Lisowski, M., Major, J. J., Malone, S. D., Messerich, J. A., Moran, S. C., Pallister, J. S., Qamar, A. I., Schilling, S. P., and Vallance, J. W. (2006). Dynamics of seismogenic volcanic extrusion at Mount St Helens in 200405. Nature, 444(7118):439–443.

Waite, G. P., Chouet, B. A., and Dawson, P. B. (2008). Eruption dynamics at Mount St. Helens imaged from broadband seismic waveforms: Interaction of the shallow magmatic and hydrothermal systems. Journal of Geophysical Research: Solid Earth, 113(2):1–22.

]]>For those who have not seen these matrices before, what it shows is the similarity between different arrays. If two arrays have a correlation value of 1.0, this means that they have a **perfect correlation** (i.e. they are exactly the same), and a correlation value of 0.0 means that there is absolutely no similarity between the two. This can be used to compare datasets with one another if you are looking for a similar pattern.

Also, it is worth noting that one of the principal statements made in statistics is that,

“Correlation does not imply causation”

So you should also have some further information to back-up the correlation between arrays.

An example of one of these correlation matrices can be seen below, which shows the comparison of 54 arrays with each other (i.e. I have taken each array and cross-correlated it with the other 53 arrays). The squares with a darker tone have a higher correlation than those with a lighter tone.

Your first step is putting your correlation values into a `pandas.DataFrame`

format, you can then just use the code below in order to create the matrix! This table should contain the full dataset, and this code can then create it into this triangle shape (as otherwise you will end up with the mirror image of this on the identity axis). I have used absolute values as I didn’t want to deal with negative correlation at this stage (this is when it is a perfect match but reversed in the x-axis).

If you don’t have any correlation values, I’d recommend reading up on cross-correlation, which is a function where you can obtain these correlation values. I might produce a blog post on this at a later date, but it is worth reading into it yourself so that you can fully understand the output.

— Roseanne

import numpy as np import matplotlib.pyplot as plt import seaborn as sns sns.set(font_scale=1.5) def corr_mat_plot(correlation_mat, show = True, outfile = None): """ Plots the correlation matrix in an image plot to show where the highest correlation between arrays is. """ # Make the mask for the upper triangle so that it doesn't mirror image the values mask = np.zeros_like(correlation_mat, dtype=np.bool) mask[np.triu_indices_from(mask)] = True # Set up the figure fig, ax = plt.subplots(figsize=(10, 10)) sns.set(font_scale=1.5) # Draw matrix sns.heatmap(np.abs(correlation_mat), cmap = sns.cubehelix_palette(8, as_cmap=True), mask=mask, vmin = 0,vmax=1, square=True, xticklabels=50, yticklabels=50, cbar_kws = {"shrink": .8, "label" : ("Correlation value")}, ax=ax) plt.title("Correlation between the arrays") if show: plt.show() if outfile: fig.savefig(outfile) elif show: plt.show() else: return fig]]>

**Gutenberg-Richter**

The Gutenberg-Richter law is a relationship which every seismologist knows – for those who are not so aware (like me just over a year ago), it refers to an expression which relates the total number of earthquakes in any given region to the magnitude, by the following equation:

where is the total number of earthquakes, is a constant (usually 1), is another constant which depends on the seismicity in the area (close to 1 in seismically active areas), and is the magnitude. This can also be seen by the plot below.

What this expression does, is relate the frequency of earthquakes with their magnitude, i.e., there are lots of small earthquakes, and very few large earthquakes – makes sense.

At the moment, I am creating synthetic seismograms (see Make some noise for how to make the seismic noise), and as I am trying to make my seismograms as realistic as possible, it is only logical to want to have my seismic events follow a Gutenberg-Richter distribution as well. I have also added in a term for setting a minimum magnitude, as quite often there is a ‘fall-off’ of the magnitudes in the lower end, as it is sometimes harder to actually pick up these magnitudes in real-life.

**Poisson distribution**

You are probably wondering where the fish part of my title comes into play – well that’s because when I add my events, I am doing so with Poisson spaced inter-event times (also below), with magnitudes that follow this distribution (i.e., lots of small and few large earthquakes). For those still not following, Poisson = fish in French.. (ba dum tss)

Anyways, Poisson is used for the spacing of inter-event times as it is said that earthquakes follow a Poisson distribution. This is a rule which assigns probabilities to the number of occurrences, with a known average rate. This can be seen by the mathematical formula below,

where the left term says the probability of at least one earthquake occurring in the time , where there is an average recurrence time – this can also be referred to with , where is the rate (i.e., ), can be estimated.

So, if we were to say that there were an average recurrence time of 31 days, then after 25 days, there would be a 55% probability of an event. A Poisson distribution can be easily incorporated, as we just need to produce random numbers which scale to this term, as seen in the code at the end of this post.

In summary, I utilise both Gutenberg-Richter and Poisson statistics for my events, where the magnitude is scaled to Gutenberg-Richter, and are spaced as per Poisson distribution. I have supplied both functions (including how to do the Gutenberg-Richter plot) below.

— Roseanne

def gutenberg_richter(b=1.0, size=1, mag_min = 0.0): """Generate sequence of earthquake magnitudes according to G-R law. logN = a-bM Includes both the G-R magnitudes, and the normalised version. """ g = mag_min + np.log10(-np.random.rand(size) + 1.0) / (-1*b) gn = g/g.max() return g, gn # code for plotting the G-R distribution testn = gutenberg_richter(size = 10**8) y, bine = np.histogram(testn) binc = 0.5 * (bine[1:] + bine[:-1]) plt.plot(binc, y, '.-') plt.yscale('log', nonposy='clip') plt.xlabel("Magnitude") plt.ylabel("Log Cumulative frequency") def poisson_interevent(lamb, number_of_events,st_event_2, samp_rate): """ Finds the interevent times using Poisson, for the events, by choosing lamb and number_of_events. We can use the random.expovariate function in Python, as this generates exponentially distributed random numbers with a rate of lambda for the first x number of events ( [int(random.expovariate(lamb)) for i in range(number_of_events)] ). By taking the cumulative sum of these values, we then have the times at which to place the events with Poisson inter-event times. Here we create an array with a list of times which are spaced at a Poisson rate of lambda. This will then be used as the times of the noise in which we place the event at. lamb = lambda value for Poisson number_of_events = how many events you want st_event_2 = your event samp_rate = sampling rate """ poisson_values = 0 while (poisson_values == 0): poisson_values = [int(random.expovariate(lamb)) for i in range(number_of_events)] poisson_times = np.cumsum(poisson_values) for i in range(len(poisson_times)-1): if poisson_times[i+1] - poisson_times[i] <= len(st_event_2)/samp_rate: poisson_values = 0 return poisson_values, poisson_times]]>

- Load in some typical seismic noise (I have taken mine from a quiet day near the Tunguruhua volcano in Equador), which has been detrended and demeaned.
- Taking the Fast Fourier Transform (FFT) of this (this puts the data into the frequency domain).
- Smooth the FFT data.
- Multiply this by the FFT of white noise.
- Take the Inverse Fast Fourier Transform (IFFT) of this (takes it back into the time domain).

The results of this are shown below, where the green is our white noise, the blue is our real seismic noise, and the pink is our synthetic seismic noise.

There are a few other intermediate steps to this code (such as looping through so that it is in segments), however it is quite a simple process! A few other libraries are loaded into this beforehand, such as Obspy and Numpy, however you will probably have loaded these in already if you are doing this.

Now go and make some noise!

— Roseanne

def noise_segmenting(poisson_times, st_event_2, st_t, noise_level, samp_rate, delta): """ Creates the noise array so that it is big enough to host all of the events. Creating the noise by multiplying white noise by the seismic noise, in the frequency domain. We then inverse FFT it and scale it to whatever SNR level is defined to output the full noise array. poisson_times = array of times where we then put in the seismic events (boundary for the noise) st_event_2 = size of events that we are putting in later (again, this is a boundary) st_t = seismic noise array that you are basing your synthetic on samp_rate, delta = trace properties of st_t """ # end time for noise to cover all events noise_lim = (poisson_times[-1] + len(st_event_2)) *2 #gives some time after last event # load in seismic noise to base the synthetic type on st_noise_start_t = UTCDateTime("2015-01-22T01:00:00") st_noise_end_t = UTCDateTime("2015-01-22T01:02:00") test_trace = st_t[0].slice(st_noise_start_t, st_noise_end_t) test_trace_length = int(len(test_trace) / test_trace.stats.sampling_rate) # setting the boundary for how many loops etc minutes_long = (noise_lim)/st_event_2.stats.sampling_rate noise_loops = int(np.ceil(minutes_long/2.0)) #working out how many 2 minute loops we need # zero array noise_array = np.zeros([noise_loops, len(test_trace)]) # loop for the amount of noise_loops needed (in segments) for j in range(noise_loops): # we average the seismic noise over twenty 2 minute demeaned samples tung_n_fft = np.zeros([20, int(np.ceil((len(test_trace)/2.0)))]) for i in range(20): st_noise = st_t[0].slice(st_noise_start_t+(i*test_trace_length),st_noise_end_t+(i*test_trace_length)) noise_detrended = st_noise.detrend() noise_demeaned = mlab.demean(noise_detrended) noise_averaging = Trace(noise_demeaned).normalize() tung_n_fft[i] = np.fft.rfft(noise_averaging.data) # work out the average fft ave = np.average(tung_n_fft, axis=0) # smooth the data aves = movingaverage(ave,20) # create white noise whitenoise = np.random.normal(0, 1, len(noise_averaging)) whitenoise_n = Trace(whitenoise).normalize() # FFT the white noise wn_n_fft = np.fft.rfft(whitenoise_n.data) # multiply the FFT of white noise and the FFT smoothed seismic noise newnoise_fft = wn_n_fft * aves # IFFT the product newnoise = ifft(newnoise_fft, n = len(st_noise)) noise_array[j] = np.real(newnoise) # transform the noise into an Obspy trace full_noise_array = np.ravel(noise_array) full_noise_array_n = Trace(np.float32(full_noise_array)).normalize() full_noise_array_n_scaled = Trace(np.multiply(full_noise_array_n, noise_level)) full_noise_array_n_scaled.stats.sampling_rate = samp_rate full_noise_array_n_scaled.stats.delta = delta return full_noise_array_n_scaled]]>

Through my DTP, I was rewarded funding to undertake a 2 week internship at a business, in order to get some real-life experience. I think we all know that 2 weeks is not quite long enough to do much, but it is really useful in the sense of forging connections between my own research interests, and industry work. In my case, there were connections with Risk Management Solutions (RMS), who are a company who model catastrophe risk for insurance purposes. Their work ranges from modelling the risks associated with earthquakes (so right up my street) to terrorism risk.

My work entailed looking into multi-fault ruptures in California. Multi-fault ruptures were once thought of as a rare earthquake case of earthquakes ‘jumping’ from fault to fault. However, they are a deadly occurrence, as with multi-faults come larger earthquakes. Cases such as the M7.8 2016 Kaikoura, New Zealand earthquake, show just how massive multi-fault ruptures can be. This particular earthquake is reported as having 21 different fault ruptures, and causing about 180km of surface rupture.

This is probably one of the most complex earthquake cases, as there are just so many faults involved. It also brings up the question: *what if there were a similar case in California?* It could be deadly.

As our understanding of earthquakes is ever evolving, it is important that earthquake forecast models are inclusive of this information. Uniform California Earthquake Rupture Forecast, Version 3 (UCERF3) is a new earthquake forecast model for the whole of California, which has been developed by a breadth of specialists. It is a highly advanced model, as it estimates magnitude, location, as well as the likelihood of potential earthquakes. UCERF3 is particularly new and innovative, when compared with other models, as it incorporates multi-fault ruptures. From including these types of ruptures in their forecast, the possibility of a larger earthquake (M>7) increased.

The overall likelihood of a magnitude 6.7 (or higher) earthquake occurring within the next 30 years, was calculated by UCERF3, and is shown below. The research done for the UCERF3 model not only shows the increased likelihood of earthquakes in California, but also how interconnected the whole fault system is.

I would recommend checking out UCERF3 if you are interested – it’s fascinating being able to see all the different fault sections and how they are connected to one another.

It’s a shame I only had two weeks to look into this area – hopefully I will get the chance to revisit it again to be able to apply the model myself.

–Roseanne

]]>