Jekyll2021-07-12T02:34:25+00:00http://sassafras13.github.io/feed.xmlEmma BenjaminsonMechanical Engineering Graduate StudentGoogle’s Python Style Guide Part 1 - Functionality2021-07-11T00:00:00+00:002021-07-11T00:00:00+00:00http://sassafras13.github.io/PythonStyleGuideFunc<p>I just recently learned that Google published a style guide for how their developers write clean code in Python [1]. I wanted to use a couple of posts to outline some of the things I learned from that style guide. I will write this post to describe some of the functional recommendations given in the style guide, and a follow-up post will detail some of the specific style requirements Google listed. Let’s get started!</p>
<h2 id="googles-python-style-guide---functional-recommendations">Google’s Python Style Guide - Functional Recommendations</h2>
<p>The first half of Google’s style guide focuses on best practices for using different functionalities within Python. I should note that there are more recommendations than I am giving here - I have selected the items that were relevant to aspects of Python that I already use or want to use more frequently. I would highly recommend glancing through the style guide yourself if you want a more complete picture of Google’s recommendations. But for now, here is what I thought was important [1]:</p>
<p><strong>Use a code linter.</strong> A code linter is a tool that looks at code and identifies possible errors, bugs or sections that are poorly written and could contain syntax errors [2]. Google recommends using a Python library like pylint to check your code before deploying it.</p>
<p><strong>Use import statements for packages and modules but not individual classes or functions.</strong> I think this recommendation helps with namespace management - if you are only importing complete packages/modules, then we will always be able to trace back specific classes or functions to those libraries (i.e. we know that module.class is a class that belongs to “module”). This practice also helps prevent collisions (i.e. having multiple functions with the same name).</p>
<p><strong>Import modules by full pathname location.</strong> This is important for helping the code to find modules correctly. Google recommends writing this:</p>
<p><code class="language-plaintext highlighter-rouge">from doctor.who import jodie</code></p>
<p>Instead of writing this:</p>
<p><code class="language-plaintext highlighter-rouge">import jodie</code></p>
<p><strong>Use exceptions carefully.</strong> Usually exceptions are only for breaking out of the flow for specific errors and special cases. Google recommends using built-in exception classes (like KeyError, ValueError, etc. [3]) whenever possible. You should try to avoid using the “Except:” statement on its own because it will catch too many situations that you probably don’t want to have to handle. On a similar note, try to avoid having too much code in a try-except block and make sure to always end with “finally” to make sure that essential actions are always completed (like closing files).</p>
<p><strong>Do not use global variables.</strong> Global variables can be variables that have scopes including an entire module or class. Python does not have a specific datatype for constants like other languages, but you can still stylistically create them [4], for example by writing them as _MY_CONSTANT = 13. The underscore at the beginning of the variable name indicates that the variable is internal to the module or class that is using it.</p>
<p><strong>It is okay to use comprehensions and generators on simple cases, but avoid using them for more complicated situations.</strong> Comprehensions<em>1 and generators</em>2 are really useful because they do not require for loops, and they are elegant and easy to read. They also do not require much memory. However, complicated constructions of comprehensions/generators can make your code more opaque. Generally, Google recommends using comprehensions/generators as long as they fit on one line or the individual components can be separated into individual lines.</p>
<p><strong>Use default iterators and operators for data types that support them.</strong> Some data types, like lists and dictionaries, support specific iterator keywords like “in” and “not in.” It’s acceptable to use these iterators because they are simple, readable and efficient, but you want to make sure that you do not change a container when you are iterating over it (since lists and dictionaries are <a href="https://sassafras13.github.io/MutvsImmut/">mutable objects</a> in Python).</p>
<p><strong>Lambda functions are acceptable as one-liners.</strong> Lambda functions define brief functions in an expression, such as [7]:</p>
<p><code class="language-plaintext highlighter-rouge">(lambda x: x + 1)(2) = 2 + 1 = 3</code></p>
<p>They are convenient but hard to read and debug. They also are not explicitly named, which can be a problem. Google recommends that if your lambda function is longer than 60 to 80 characters, then you should just write a proper function instead.</p>
<p><strong>Default argument values can be useful in function definitions.</strong> You can assign default values to specific arguments to a function. You always want to place these parameters last in the list of arguments for a given function. This is a good practice when the normal use case for a function requires default values, but you want to give the user the ability to override those values in special circumstances. One downside to this practice is that the defaults are only evaluated <em>once</em> when the module containing the function is loaded. If the argument’s value is <em>mutable</em>, and it gets modified during runtime, then the default value for the function has been modified <em>for all future uses</em> of that function!*3 So the best practice to avoid this issue is to make sure that you do not use mutable objects as default values for function arguments.</p>
<p><strong>Use implicit false whenever possible.</strong> All empty values are considered false in a Boolean context, which can really help with improving your code readability. For example, this is how we would write an implicit false:</p>
<p><code class="language-plaintext highlighter-rouge">if foo: …</code></p>
<p>This is the explicit version, which is not as clean:</p>
<p><code class="language-plaintext highlighter-rouge">if foo != [ ]: …</code></p>
<p>Not only is the implicit approach cleaner, it is also less error prone. The only exception is <strong>if you are checking integers</strong>, when you want to be explicit, i.e.:</p>
<p><code class="language-plaintext highlighter-rouge">if foo == 0: …</code></p>
<p>In this case you want to be clear about whether you want to know if the integer variable’s value is zero, or if it is simply empty (in which case you would use “if foo is None”). Also, remember that empty sequences are false - you don’t need to check if they’re empty using “len(sequence)”.</p>
<p><strong>Annotate code with type hints.</strong> This is especially good practice for function definitions. It helps with readability and maintainability of your code. It often looks like this:</p>
<pre><code class="language-(python3)">def myFunc(a: int) -> list[int]:
…
</code></pre>
<p>That is all for today’s post on functional recommendations in Google’s Python style guide. Next time, I will write more specifically about the stylistic recommendations that Google provides for coding in Python. Thanks for reading!</p>
<h2 id="footnotes">Footnotes</h2>
<p>*1 Comprehensions are a tool in Python that let you iterate over certain data types like lists, sets, or generators. They can make your code more elegant, and allow you to generate iterables in one line of code. The syntax for a comprehension looks like this [5]:
<code class="language-plaintext highlighter-rouge">new_list = [expression for member in iterable]</code></p>
<p>*2 Generator functions are useful for iterating over really large datasets. They are called “lazy iterators” because they do not store their internal state in memory. They also use the “yield” statement instead of the “return” statement. This means that they can send a value back to the code that is calling the generator function, but they don’t have to exit after they have returned, as in a regular function. This allows generator functions to remember their state. In this way generators are very memory efficient but allow for iteration similar to comprehensions [6].</p>
<p>*3 This happened to a classmate of mine once, and he said it almost ruined a paper submission for him. This is covered in detail in [8].</p>
<h2 id="references">References</h2>
<p>[1] “Google Python Style Guide.” <a href="https://google.github.io/styleguide/pyguide.html">https://google.github.io/styleguide/pyguide.html</a> Visited 11 Jul 2021.</p>
<p>[2] Mallett, E. E. “Code Lint - What is it? What can help?” DCCoder. 20 Aug 2018. <a href="https://dccoder.com/2018/08/20/code-lint/">https://dccoder.com/2018/08/20/code-lint/</a> Visited 28 Jun 2021.</p>
<p>[3] “Built-in Exceptions.” The Python Standard Library. <a href="https://docs.python.org/3/library/exceptions.html">https://docs.python.org/3/library/exceptions.html</a> Visited 11 Jul 2021.</p>
<p>[4] Hsu, J. “Does Python Have Constants?” Better Programming on Medium. 7 Jan 2020. <a href="https://betterprogramming.pub/does-python-have-constants-3b8249dc8b7b">https://betterprogramming.pub/does-python-have-constants-3b8249dc8b7b</a> Visited 11 Jul 2021.</p>
<p>[5] Timmins, J. “When to Use a List Comprehension in Python.” Real Python. <a href="https://realpython.com/list-comprehension-python/">https://realpython.com/list-comprehension-python/</a> Visited 11 Jul 2021.</p>
<p>[6] Stratis, K. “How to Use Generators and yield in Python.” Real Python. <a href="https://realpython.com/introduction-to-python-generators/">https://realpython.com/introduction-to-python-generators/</a> Visited 11 Jul 2021.</p>
<p>[7] Burgaud, A. “How to Use Python Lambda Functions.” Real Python. <a href="https://realpython.com/python-lambda/">https://realpython.com/python-lambda/</a> Visited 11 Jul 2021.</p>
<p>[8] Reitz, K. “Common Gotchas.” The Hitchhiker’s Guide to Python. <a href="https://docs.python-guide.org/writing/gotchas/">https://docs.python-guide.org/writing/gotchas/</a> Visited 11 Jul 2021.</p>I just recently learned that Google published a style guide for how their developers write clean code in Python [1]. I wanted to use a couple of posts to outline some of the things I learned from that style guide. I will write this post to describe some of the functional recommendations given in the style guide, and a follow-up post will detail some of the specific style requirements Google listed. Let’s get started!Week of July 5 Paper Reading2021-07-06T00:00:00+00:002021-07-06T00:00:00+00:00http://sassafras13.github.io/WeekJul5Rdg<p>This week I have been interested in reading papers about how to model time series data using unsupervised methods in machine learning. I will briefly summarize a couple of papers on the topic below.</p>
<h2 id="paper-1-velc-a-new-variational-auto-encoder-based-model-for-time-series-anomaly-detection-by-zhang-et-al">Paper 1: VELC: A New Variational Auto Encoder Based Model for Time Series Anomaly Detection by Zhang et al.</h2>
<p>This paper presents a method for finding anomalies in time series data using variational autoencoders. I did not know what <strong>anomaly detection</strong> really was until I read this paper - it is essentially the practice of looking for rare events in the data that are very different from the rest of the dataset, but are likely to be important, not random noise. Anomaly detection can be really difficult to do in a supervised fashion because the size of the anomaly class will generally be much smaller than the size of the “normal” class. But this paper proposes an unsupervised learning approach that side-steps that problem [1].</p>
<p>The authors introduce a VAE that has an additional re-Encoder and Latent Constraint network (VELC) that helps the model tell the difference between normal and anomalous data based on how well the model can reconstruct the input data. The basic idea here is that the model is trained to encode and decode normal data, and as training progresses it will minimize its reconstruction error (i.e. how different the reconstructed data is from the original input data). Then when the model is given a mix of normal and anomalous test data, the reconstruction error should increase dramatically for the anomalous samples as compared to the normal samples, indicating which samples are anomalous. So if the reconstruction error is small, the input is normal; if the error is large, the input is anomalous [1].</p>
<p><img src="/images/2021-07-06-WeekJul5Rdg-fig1.png" alt="Fig 1" title="Figure 1" /> <br />
Figure 1 - Source: [1]</p>
<p>A more detailed view of the VELC model is shown in Figure 1. The VAE itself uses an LSTM as the encoder and decoder, because the LSTM is designed to process time-series data. There is a constraint network that learns the latent space in the VAE alongside the encoder and decoder during training. The purpose of the constraint network is to limit the samples pulled from the latent space during testing to only look like samples it saw during training - in other words, the constraint network ensures that the VELC model only pulls normal samples from the latent space of the VAE. The second re-encoder maps the output of the first decoder to a new latent space. The authors argue that the second re-encoder helps to ensure that the model trains more accurately, and computes more accurate anomaly scores, than it would with just a classical VAE structure [1].</p>
<h2 id="paper-2-a-deep-neural-network-for-unsupervised-anomaly-detection-and-diagnosis-in-multivariate-time-series-data-by-zhang-et-al">Paper 2: A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data by Zhang et al.</h2>
<p>This paper introduces a new model, Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED), which extends the capabilities of VELC so that instead of considering a single time series, we can perform anomaly detection across multiple time series at the same time. (The authors refer to this as <strong>multivariate time series</strong> data.) Zhang et al. argue that their model is the first to simultaneously complete 3 tasks [2]:</p>
<ol>
<li>Anomaly detection: as above</li>
<li>Root cause identification: identifying which time series signal(s) in the input is contributing to to the anomaly</li>
<li>Anomaly severity: giving the user a metric that estimates how strongly the anomaly deviates from normal data</li>
</ol>
<p><img src="/images/2021-07-06-WeekJul5Rdg-fig2.png" alt="Fig 2" title="Figure 2" /> <br />
Figure 2 - Source: [2]</p>
<p>The graphical abstract for this picture is given in Figure 2. The basic idea of this model is similar to the VELC model in that it is also trying to reconstruct an input signal, and using the reconstruction error as an indication of whether or not that input is normal or anomalous. The MSCRED model also uses a variant of an LSTM to handle the time series data, similar to VELC. The difference is that the MSCRED model assumes that there may be useful correlations between different time series signals that we should look for in order to identify anomalies in the full dataset. Let’s take a closer view at the different components of the MSCRED model to understand how it looks for correlations across time series signals as well as within them [2].</p>
<p>The authors assume that the raw data is in the form of <em>n</em> time series that extend for <em>T</em> time; they also assume that the data is normal for time in the interval [0, <em>T</em>] but that the data input to the model after that time can be abnormal. Their real-world example is a power plant that has time series data from different sensors that together can be used to look for anomalies that could be indications of potential failures. But the input to the MSCRED is not the raw time series data. Instead, Zhang et al. compute the pairwise correlations between each time series and save that data in <em>n x n</em> <strong>signature matrices</strong>. It is these signature matrices that then become the input to the model [2].</p>
<p>The first component of the model is a convolutional encoder that is designed to look for correlations across the time series signals (i.e. for correlations between the entries in the signature matrix, where each entry is the correlation between two signals). This convolutional encoder learns to represent the spatial information in the signature matrices, and passes this on to an attention-based convolutional LSTM model (ConvLSTM). The authors explain that they adapted the original ConvLSTM model [3], which was able to learn the temporal information in a video sequence, but struggled to perform over longer time intervals. To mitigate this issue, they add an attention mechanism to the original ConvLSTM which allows it to selectively remember the relevant hidden states across time steps, increasing the memory of the model. Together, the attention mechanism and ConvLSTM are capable of finding both temporal and spatial patterns in the signature matrix, and they return feature maps indicating which elements in time and space are important to pay attention to. The feature maps are processed by a convolutional decoder and used to reconstruct the signature matrices. We use the residual of the signature matrices (i.e. the difference between the input and the output signature matrices) to compute a reconstruction score and identify which inputs are anomalies and which are normal. These scores help identify the anomalies, diagnose which signals contributed to the anomaly (i.e. root cause analysis) and the scores also contain information about the severity of the anomaly [2].</p>
<h2 id="paper-3-learning-to-simulate-complex-physics-with-graph-networks-by-sanchez-gonzalez-et-al">Paper 3: Learning to Simulate Complex Physics with Graph Networks by Sanchez-Gonzalez et al.</h2>
<p>DeepMind delivers again with a beautiful paper on how graph networks can form the basis of a <strong>learned simulator</strong> that can model the physics of a variety of systems (i.e. from water to sand). This paper introduces a Graph Network-based Simulator (GNS) that uses a graph-based representation to model the physics of a system of particles. The authors argue that the value of building a learned simulator with artificial intelligence is that it can a) be built much faster, b) can run more efficiently (both in time and in memory allocation) and c) the simulator remains efficient when scaled up to larger systems [4].</p>
<p><img src="/images/2021-07-06-WeekJul5Rdg-fig3.png" alt="Fig 3" title="Figure 3" /> <br />
Figure 3 - Source: [4]</p>
<p>As shown in Figure 3, the overall architecture of DeepMind’s solution relies on a learned simulator that regularly updates the model of the system to accurately recreate the dynamics of a group of particles that represent water or sand or other fluid or rigid systems. In Figure 3, the simulator uses some learned dynamics, $d_{\theta}$, to update the states of all the particles and simulate their trajectories. The learned dynamics, \(d_{\theta}\), use a set of graph networks to learn the dynamics [4]. Let’s dive into the structure of \(d_{\theta}\) in more detail.</p>
<p>The dynamics are modeled using 3 key components: an encoder, a processor and a decoder. The encoder takes as input the first state of all of the particles in the system, and encodes that information into a graph in the latent space. This graph is then passed to the processor, which learns message-passing functions that connect all the nodes in the graph together (within a certain radius) and generates a series of output latent graphs that represent the progression of the system over time. The decoder extracts the relevant dynamics information (i.e. accelerations) from the final latent graph and passes them to an update mechanism, which in this case is a simple Euler integration function. In essence, this entire GNS is still using integration to solve the dynamics by stepping through sequential time steps - the only complexity comes in to how the dynamics are learned and represented and applied. The authors argue that this model is very general and so can be used to represent many different types of particle systems [4].</p>
<p>This all still felt a little general to me until I read deeper into the methods section to understand how each piece (the encoder, the processor and the decoder) were actually implemented. The encoder takes in the position, last <em>C</em> velocities and static properties of each particle and assigns one node to each particle. The encoder then learns functions to embed the input data into the nodes and edges of this graph. These encoder embedding functions are MLPs. The graphs embedded by the encoder are then passed to the processor which has a stack of <em>M</em> graphs with identical structure. The processor learns edge and node update functions, which are also MLPs. Finally, the decoder has a learned function (also an MLP) that is applied to each node on the final graph from the processor and outputs the second derivatives for each node to be passed to the update mechanism [4].</p>
<p>I also wanted to point out that I loved the way that this paper was written. Its figures do an excellent job of giving a high-level view of the architecture that is simple without losing too much resolution. The paper is written so that the reader spirals around the architecture, adding successively more detail on each pass. In total, I believe the authors went through the architecture three times, each time adding more information on the philosophy and implementation of the model design. This is something I would really like to do in my own writing.</p>
<h2 id="references">References:</h2>
<p>[1] Zhang, C., Li, S., Zhang, H., & Chen, Y. (2019). VELC: A New Variational AutoEncoder Based Model for Time Series Anomaly Detection. https://arxiv.org/abs/1907.01702v2</p>
<p>[2] Zhang, C., Song, D., Chen, Y., Feng, X., Lumezanu, C., Cheng, W., Ni, J., Zong, B., Chen, H., & Chawla, N. V. (2018). A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, 1409–1416. https://arxiv.org/abs/1811.08055v1</p>
<p>[3] Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; and Woo, W.c. 2015. Convolutional lstm network: A machine learning approach for precipitation nowcasting. In NIPS, 802–810.</p>
<p>[4] Sanchez-Gonzalez, A., Godwin, J., Pfaff, T., Ying, R., Leskovec, J., & Battaglia, P. W. (2020). Learning to Simulate Complex Physics with Graph Networks.</p>This week I have been interested in reading papers about how to model time series data using unsupervised methods in machine learning. I will briefly summarize a couple of papers on the topic below.Getting Some Intuition for Matrix Exponentials2021-06-11T00:00:00+00:002021-06-11T00:00:00+00:00http://sassafras13.github.io/MatrixExps<p>In a <a href="https://sassafras13.github.io/MLSBasics/">recent post</a>, we talked about some fundamental mathematical operations presented in a robotics textbook written by Murray, Li and Sastry. One of these operations was a <strong>matrix exponential</strong>, which was unfamiliar to me. It turns out that matrix exponentials are a really cool idea that appear in lots of fields of science and engineering. Thanks to a fabulous video by 3Blue1Brown [1], I am going to present some of the basic concepts behind matrix exponentials and why they are useful in robotics when we are writing down the kinematics and dynamics of a robot.</p>
<p>In MLS, the authors explain that matrix exponentials are useful for “map[ping] a twist into the corresponding screw motion” [2]. Recall that a twist is infinitesimally small and the screw contains the full magnitude of the motion [2]. Another way to say this is that the matrix exponential can encode a rotation as a function of the direction of rotation and the angle of rotation [2]. I will explain this in more detail in this post.</p>
<h2 id="basic-definition-of-a-matrix-exponential">Basic Definition of a Matrix Exponential</h2>
<p>A matrix exponential is related to the simpler concept of raising the number <em>e</em> to a real number exponent. We can write this operation as an infinite sum [1]:</p>
\[e^x = x^0 + x^1 + \frac{1}{2}x^2 + \frac{1}{6}x^3 + … + \frac{1}{n!}x^n + …\]
<p>Notice that this expression is a <a href="https://sassafras13.github.io/TaylorSeries/">Taylor series</a>. The sum of the Taylor series approaches the value of \(e^x\) [1].</p>
<p>A matrix exponential is, in a sense, an extension of this idea, using matrices as input \(x\) instead of real numbers. For example, I can rewrite the expression above with \(x = \left[ \matrix{1 & 2 \cr 3 & 4} \right]\) [1]:</p>
\[e^{\left[ \matrix{1 & 2 \cr 3 & 4} \right]} = \left[ \matrix{1 & 2 \cr 3 & 4} \right]^0 + \left[ \matrix{1 & 2 \cr 3 & 4} \right]^1 + \frac{1}{2}\left[ \matrix{1 & 2 \cr 3 & 4} \right]^2 + \frac{1}{6}\left[ \matrix{1 & 2 \cr 3 & 4} \right]^3 + …\]
<p>This still makes sense because I can raise matrices to a real number power by multiplying the matrix by itself <em>n</em> times. And in general, this infinite series will always approach a stable value - in this case, a stable matrix [1].</p>
<p>The matrix exponential is useful in mathematics when we are trying to solve a system of differential equations. For example, let’s say I want to find expressions for \(x(t)\), \(y(t)\) and \(z(t)\) given the equations below [1]:</p>
<p>\(\frac{dx}{dt} = a \cdot x(t) + b \cdot y(t) + c \cdot z(t)\) <br />
\(\frac{dy}{dt} = d \cdot x(t) + e \cdot y(t) + f \cdot z(t)\) <br />
\(\frac{dz}{dt} = g \cdot x(t) + h \cdot y(t) + i \cdot z(t)\)</p>
<p>I can use a matrix exponential to find the coefficients of the functions [1]:</p>
\[e^{\left[ \matrix{a & b & c \cr d & e & f \cr g & h & i} \right]t}\]
<p>More generally, if I have a system of equations \(X(t)\) and a matrix of coefficients \(M\) then I can solve the following differential equation written in terms of linear algebra to find expressions for all the functions contained in \(X(t)\) [1]:</p>
\[\frac{d}{dt}X(t) = MX(t)\]
<h2 id="remembering-e-and-its-derivative">Remembering e and Its Derivative</h2>
<p>There is another expression that looks remarkably similar to the one shown just above. Specifically, the derivative of \(e\) has the same form as the equation that can be used to solve a system of differential equations [1]:</p>
\[\frac{d}{dt}e^{rt} = re^{rt}\]
<p>(Keep in mind that we need to also take into account initial conditions if we want to find the solution to a specific system of equations [1].)</p>
<p>Bringing it all together, it can be shown (check out the last part of [1]) that the derivative of the matrix exponential follows the same form as the derivative of e when raised to a real number, that is [1]:</p>
\[\frac{d}{dt} e^{Mt} X_0 = M \big( e^{Mt} X_0 \big)\]
<h2 id="showing-that-the-definition-of-a-matrix-exponential-is-correct">Showing that the Definition of a Matrix Exponential is Correct</h2>
<p>Let’s take a simple example of how the matrix exponential is used to encode rotations, which we mentioned earlier is one of the reasons why they are so useful in robot kinematics. When we find a matrix that correctly encodes a given rotation, we will see that it is typically a skew-symmetric matrix which can be used to convert between screws and twists [2].</p>
<p>Consider the matrix, \(\left[ \matrix{0 & -1 \cr 1 & 0} \right]\). This matrix is a solution to the following system of equations [1]:</p>
\[\frac{d}{dt} \left[ \matrix{x(t) \cr y(t)} \right] = \left[ \matrix{0 & -1 \cr 1 & 0} \right] \left[ \matrix{x(t) \cr y(t)} \right]\]
<p>Geometrically, this expression indicates that the rate of change of \(\left[ \matrix{x(t) \cr y(t)} \right]\) is tangent to the direction of \(\left[ \matrix{x(t) \cr y(t)} \right]\) and has the same magnitude (this is shown in Figure 1) [1].</p>
<p><img src="/images/2021-06-11-MatrixExps-fig1.png" alt="Fig 1" title="Figure 1" /> <br />
Figure 1 - Source [1]</p>
<p>But mathematically, why does this make sense? If we compute the Taylor series of \(e^{\left[ \matrix{0 & -1 \cr 1 & 0} \right]t}\), we will find that each term in the matrix becomes an infinite sum with a specific pattern as follows [1]:</p>
\[e^{\left[ \matrix{0 & -1 \cr 1 & 0} \right] t} = \left[ \matrix{ 1 - \frac{t^2}{2!} + \frac{t^4}{4!} - \frac{t^6}{6!} + … & -t + \frac{t^3}{3!} - \frac{t^5}{5!} + \frac{t^7}{7!} - … \cr t - \frac{t^3}{3!} + \frac{t^5}{5!} - \frac{t^7}{7!} + … & 1 - \frac{t^2}{2!} + \frac{t^4}{4!} - \frac{t^6}{6!} + …} \right]\]
<p>And guess what? Those infinite sums are exactly the Taylor series for the sine and cosine functions [1]:</p>
\[e^{\left[ \matrix{0 & -1 \cr 1 & 0} \right] t} = \left[ \matrix{ cos(t) & -sin(t) \cr sin(t) & cos(t)} \right]\]
<p>This is also the expression for a 90 degree rotation counterclockwise with some angle \(t\) [1]. How cool is that? So now we have direct mathematical proof that the matrix exponential of \(\left[ \matrix{0 & -1 \cr 1 & 0} \right]\) is exactly the 90 degree rotation matrix. This is why the matrix exponential is so useful in robot kinematics, because the matrix \(\left[ \matrix{0 & -1 \cr 1 & 0} \right]\) encodes useful information about the rotation in a skew-symmetric form [1].</p>
<h2 id="references">References:</h2>
<p>[1] “How (and why) to raise e to the power of a matrix, DE6.” 3Blue1Brown. 1 Apr 2021. <a href="https://www.youtube.com/watch?v=O85OWBJ2ayo&list=PLZHQObOWTQDNPOjrT6KVlfJuKtYTftqH6&index=6">https://www.youtube.com/watch?v=O85OWBJ2ayo&list=PLZHQObOWTQDNPOjrT6KVlfJuKtYTftqH6&index=6</a> Visited 11 Jun 2021.</p>
<p>[2] Murray, R., Li, Z., Sastry, S. “A Mathematical Introduction to Robotic Manipulation.” CRC Press. 1994.</p>In a recent post, we talked about some fundamental mathematical operations presented in a robotics textbook written by Murray, Li and Sastry. One of these operations was a matrix exponential, which was unfamiliar to me. It turns out that matrix exponentials are a really cool idea that appear in lots of fields of science and engineering. Thanks to a fabulous video by 3Blue1Brown [1], I am going to present some of the basic concepts behind matrix exponentials and why they are useful in robotics when we are writing down the kinematics and dynamics of a robot.Spatial Velocity vs. Body Velocity2021-06-10T00:00:00+00:002021-06-10T00:00:00+00:00http://sassafras13.github.io/SpatialvsBodyVelocity<p>As I have been learning the mathematics behind robot kinematics, I have been struggling to understand the difference between a spatial velocity and a body velocity. In this post, I am going to try to write a good definition of each of these velocity types.</p>
<h2 id="basic-definitions">Basic Definitions</h2>
<p>According to Murray, Li and Sastry (MLS), the definitions are as follows [1]:</p>
<ul>
<li>
<p><strong>Spatial velocity</strong>:- mathematically defined as \(\hat{V}^s = \dot{R}R^T = \dot{g} g^{-1}\). MLS says that the spatial velocity of a rigid motion is the instantaneous velocity of the body as viewed in the spatial frame. It is the “velocity of a (possibly imaginary) point on the rigid body which is traveling through the origin of the spatial frame at time <em>t</em>” [1]. So if you are standing at the origin of a spatial frame and you measure the velocity of a point attached to the rigid body and going through the origin where you are standing, that is the spatial velocity [1].</p>
</li>
<li>
<p><strong>Body velocity</strong>:- mathematically defined as \(\hat{V}^b = R^T \dot{R} = g^{-1} \dot{g}\). MLS says that the body velocity is more “straightforward” to understand; it is the “velocity of the origin of the body coordinate frame relative to the spatial frame, as viewed in the current body frame” [1]. MLS points out that the body velocity is not the velocity of the body relative to the body frame, because that is always zero [1]!</p>
</li>
</ul>
<h2 id="body-velocity">Body Velocity</h2>
<p>These definitions were a little confusing to me at first. Let’s take a step back and think about this with some more intuition. Let’s start with the body velocity, because I can relate that easily to the experience of riding in a car. If you imagine that you are riding in a car at a constant velocity, then the only way you know that you are moving is if you compare your frame of reference (the <strong>body frame</strong>, B) with some fixed frame (the <strong>spatial frame</strong>, A). The body velocity is your instantaneous velocity as measured with respect to this fixed frame, A, but observed from your perspective inside the car’s body frame, B. So when your car’s speedometer tells you that you are moving at 60 MPH, that means that you are moving at 60 MPH with respect to the spatial frame, A, as observed from within the body frame, B [2].</p>
<p>We can define your body velocity in this instance as follows. Let’s assume that your position in the car is a point \(q\) and so your body velocity is written as [2]:</p>
\[v_{q_b} = g_{ab}^{-1} v_{q_a}\]
<p>Where \(g_{ab}\) is the transformation that converts a point in the body frame, B, to the spatial frame, A, and \(v_{q_a}\) is the velocity of the point \(q\) in the spatial frame, A. Another way to write \(v_{q_a}\) is \(v_{q_a} = \dot{g}_{ab} q_{b}\) *1 so we can substitute that in:</p>
\[v_{q_b} = g_{ab}^{-1} \dot{g}_{ab} q_{b}\]
<p>Now I can write my body velocity as a function of <strong>some point \(q_b\) in the body frame that has velocity with respect to the spatial frame, A, as viewed in the body frame, B</strong> [2].</p>
\[v_{q_b} = \hat{V}_{ab}^b q_{b}\]
<p>In other words:</p>
\[\hat{V}_{ab}^b = g_{ab}^{-1} \dot{g}_{ab}\]
<h2 id="spatial-velocity">Spatial Velocity</h2>
<p>The difference between the spatial velocity and the body velocity is that, instead of observing your velocity in the car in the body frame, B, you are observing it from the origin of the spatial frame, A. That is, the <strong>spatial velocity measures the velocity of a point fixed to the body frame, B, with respect to the spatial frame, A, and observed from the spatial frame, A</strong> [2]. I can write the spatial velocity as follows [2]:</p>
\[v_{q_a} = \dot{g}_{ab} q_b\]
<p>I can rewrite \(q_b\) to be in terms of \(q_a\) as follows [2]:</p>
\[q_b = g_{ab}^{-1} q_a\]
<p>And therefore I can write the spatial velocity as [2]:</p>
\[v_{q_a} = \dot{g}_{ab} g_{ab}^{-1} q_a\]
<p>And so we see that [2]:</p>
\[\hat{V}_{ab}^s = \dot{g}_{ab} g_{ab}^{-1}\]
<p>I hope this helps a little bit with getting to grips with the differences between a spatial velocity and a body velocity. I think that it all comes down to where you are observing the velocity from.</p>
<h2 id="footnotes">Footnotes:</h2>
<p>*1 This is by the definition of the <a href="https://sassafras13.github.io/MLSBasics/">rigid motion mapping</a>, that is [2]:</p>
\[q_{a} = g_{ab} q_b\]
<p>And the derivative is [2]:</p>
\[\dot{q}_a = \dot{g}_{ab} q_b\]
<h2 id="references">References:</h2>
<p>[1] Murray, R., Li, Z., Sastry, S. “A Mathematical Introduction to Robotic Manipulation.” CRC Press. 1994.</p>
<p>[2] Sastry, S. “EE106A Discussion 7: Velocities and Adjoints.” EE C106A: Introduction to Robotics, Fall 2019 course notes. UC Berkeley. <a href="https://ucb-ee106.github.io/ee106a-fa19/assets/discussions/D7___Velocities_and_Adjoints.pdf">https://ucb-ee106.github.io/ee106a-fa19/assets/discussions/D7___Velocities_and_Adjoints.pdf</a> Visited 10 Jun 2021.</p>As I have been learning the mathematics behind robot kinematics, I have been struggling to understand the difference between a spatial velocity and a body velocity. In this post, I am going to try to write a good definition of each of these velocity types.Week of May 31 Paper Reading2021-06-02T00:00:00+00:002021-06-02T00:00:00+00:00http://sassafras13.github.io/weekMay31rdg<p>This week we return to reading some papers about DNA nanotechnology that my PI recommended to me a while ago and I hadn’t read before now.</p>
<h2 id="paper-1-rolling-up-gold-nanoparticle-dressed-dna-origami-into-three-dimensional-plasmonic-chiral-nanostructures-by-shen-et-al-1">Paper 1: Rolling Up Gold Nanoparticle-Dressed DNA Origami into Three-Dimensional Plasmonic Chiral Nanostructures by Shen et al. [1]</h2>
<p>Today I learned that there is a field of nanotechnology focused on arranging metallic nanoparticles in very precise structures so that they can affect light on the visible spectrum. The authors of this paper demonstrate one such approach that uses DNA origami to arrange gold nanoparticles into helical 3D structures. The idea is that DNA origami is a bottom-up manufacturing approach that has more resolution, precision and flexibility than top-down manufacturing methods like lithography. The authors showed that their structure achieved plasmonic resonance for light with a wavelength of approximately 525nm [1].</p>
<p>One thing the authors did that was particularly interesting to me was that they used a multistage assembly process to make their structure. First, they folded a flat rectangle of DNA origami using one annealing process; then they did a secondary anneal that attached the gold nanoparticles to the sheet, and then finally they performed a tertiary assembly process that rolled the sheet into a tube. The one detail that was missing was what the temperature for the tertiary assembly process was. The secondary anneal ended at room temperature, so perhaps the tube was rolled at room temperature as well, but I would be curious to know for sure what the authors did [1].</p>
<h2 id="paper-2-dna-origami-meets-polymers-a-powerful-tool-for-the-design-of-defined-nanostructures-by-hannewald-et-al-2">Paper 2: DNA Origami Meets Polymers: A Powerful Tool for the Design of Defined Nanostructures by Hannewald et al. [2]</h2>
<p>This is a detailed review of the current state of the art in building polymers using DNA origami. They present different strategies for building polymers out of DNA or for using DNA as a templating material for building polymers out of other materials. They indicate that this is an emerging area of research because there are not a large number of papers in the field yet, and they do a good job of highlighting some of the challenges involved that are preventing the field from really taking off [2].</p>
<p>The challenges range from manufacturing-related to characterization of the results of manufacturing. For example, the authors point out that often the steric hindrance of the materials in use can make it difficult for robust assembly to occur. Another challenge they highlighted was that often the yields of the assembly process are too low for characterization using gel electrophoresis, NMR or DLS [2].</p>
<p>## References:</p>
<p>[1] Shen, X., Song, C., Wang, J., Shi, D., Wang, Z., Liu, N., & Ding, B. (2012). Rolling up gold nanoparticle-dressed dna origami into three-dimensional plasmonic chiral nanostructures. Journal of the American Chemical Society, 134(1), 146–149. https://doi.org/10.1021/ja209861x</p>
<p>[2] Hannewald, N., Winterwerber, P., Zechel, S., Ng, D. Y. W., Hager, M. D., Weil, T., & Schubert, U. S. (2021). DNA Origami Meets Polymers: A Powerful Tool for the Design of Defined Nanostructures. In Angewandte Chemie - International Edition (Vol. 60, Issue 12, pp. 6218–6229). John Wiley and Sons Inc. https://doi.org/10.1002/anie.202005907</p>This week we return to reading some papers about DNA nanotechnology that my PI recommended to me a while ago and I hadn’t read before now.Fundamentals from Murray, Li and Sastry2021-06-01T00:00:00+00:002021-06-01T00:00:00+00:00http://sassafras13.github.io/MLSBasics<p>I have been trying to learn some of the fundamentals of writing equations of motion for multilink robots from the classic text by Murray, Li and Sastry*1. There were some key ideas and mathematical notations that I wanted to record for myself as I continue to work through the text [1]. I also wanted to try out using MathJax to write equations instead of my former method. Many thanks to Ian Goodfellow’s blog post on the topic [2].</p>
<h2 id="basic-terminology">Basic Terminology</h2>
<p>In MLS, the first thing we are introduced to are some terms for different kinds of motion and different ways to represent those motions mathematically:</p>
<ul>
<li><strong>Rigid motion</strong>: Motion that preserves distances between two points on the moving body.</li>
</ul>
<p>We can represent rigid body motion using a mapping, \(g \in \mathbb{R}^3\) for points and \(g_{*} \in \mathbb{R}^3\) for vectors. We talk more about <em>g</em> in the Configurations section below.</p>
<ul>
<li>
<p><strong>Screw motion</strong>: A combination of rotation and translation that occurs about/along a line.</p>
</li>
<li>
<p><strong>Twist</strong>: A representation of a screw motion that is infinitesimally small.</p>
</li>
</ul>
<p>A twist can be written as \(\xi = (v, \omega) \in \mathbb{R}^6\).</p>
<ul>
<li><strong>Wrench</strong>: A representation of a system of forces acting on a rigid body that condenses the forces and torques into 1 force and 1 torque acting along the same line.</li>
</ul>
<p>A wrench can be written as \(F = (f, \tau) \in \mathbb{R}^6\).</p>
<p>Essentially, the <strong>twist</strong> represents the kinematics of a system, while the <strong>wrench</strong> represents the dynamics.</p>
<h2 id="se3-and-so3">SE(3) and SO(3)</h2>
<p>We use different Euclidean and other groups of transformations to describe space [3]. All Euclidean groups contain transformations that preserve the Euclidean distance between two points [3]. The <strong>Euclidean distance</strong> is just the distance between two points as measured by a line that directly connects them [4]. Euclidean groups can contain translations, rotations and reflections. There are two Euclidean groups that we will see a lot in MLS: SE(3) and SO(3).</p>
<p>The <strong>special Euclidean group, SE(3)</strong> is a group of Euclidean transformations that represent rigid motions. They include translations and rotations, but not reflections [3]. The 3 in SE(3) indicates that this Euclidean group exists in 3D space - similarly, SE(2) refers to a special Euclidean group in 2D space. We can define SE(3) as follows:</p>
\[SE(3) = \big( A | A = \left[ \matrix{R & r \cr 0 & 1} \right], R \in \mathbb{R}^{3 \times 3}, r \in \mathbb{R}^3, R^TR = RR^T = I, |R| = I \big)\]
<p>Transformations that are part of a special Euclidean group can simultaneously translate and rotate a vector. They are a continuous group so therefore they are differentiable and can be considered a Lie group.</p>
<p>The other Euclidean group that we will see a lot is the <strong>special orthogonal group, SO(3)</strong>. This is the set of all rotations in 3D space [1]. It is a “special” group because the determinant of all the matrix rotations, det R, is always +1 [1]:</p>
\[SO(3) = \big( R \in \mathbb{R}^{3 \times 3}, RR^T = I, det R = +1 \big)\]
<p>We can say that SE(3) is just the product of \(\mathbb{R}^3\) with SO(3), that is [1]:</p>
\[SE(3) := \mathbb{R}^3 \times SO(3)\]
<h2 id="configurations">Configurations</h2>
<p>Earlier, we introduced the idea that we could write the configuration of a robot given some rigid body transformation, so now let’s formalize that a little. In general, I can write the configuration of a frame of reference, B, with respect to another frame of reference, A, as follows [1]:</p>
\[g_{ab} = (p_{ab}, R_{ab}) \in SE(3)\]
<p>(I could also say that \(g_{ab} \in SE(3)\) and specifically \(p_{ab} \in \mathbb{R}^3\) and \(R_{ab} \in SO(3)\) [1].)</p>
<p>The configuration \(g_{ab}\) contains a translation, \(p_{ab}\) and a rotation, \(R_{ab}\) which convert the coordinates of a point in the frame B to equivalent coordinates in the frame A.</p>
<p>There is a matrix form for writing the configuration as well [1]:</p>
\[g_{ab} = \left[ \matrix{R & p \cr 0 & 1} \right]\]
<p>We call \(g\) a configuration, but this same representation can also be used to write a rigid body transformation [1].</p>
<h2 id="-notation">^ Notation</h2>
<p>As we will see in subsequent sections, there is some hatting notation that gets used frequently in MLS which I had not seen before and so would benefit from some additional explanation. Specifically, we frequently see \(\hat{a}\) is used to represent a vector cross-product operation [1]:</p>
\[a \times b = \left[ \matrix{a_2 b_3 - a_3 b_2 \cr a_3 b_1 - a_1 b_3 \cr a_1 b_2 - a_2 b_1} \right]\]
<p>And we can capture the operation that <strong>a</strong> performs on <strong>b</strong> as [1]:</p>
\[\hat{a} = \left[ \matrix{0 & -a_3 & a_2 \cr a_3 & 0 & -a_1 \cr -a_2 & a_1 & 0} \right]\]
<p>Notice that \(\hat{a}\) is a skew-symmetric matrix, i.e. \(\hat{a}^T = -\hat{a}\) [1].</p>
<h2 id="hatting-and-unhatting-or-vee-and-wedge-operators">“Hatting” and “Unhatting” (or “Vee” and “Wedge” Operators)</h2>
<p>Related to the previous idea, we can convert between vectors and matrices in our notation using the “vee” and “wedge” operators and MLS does this frequently throughout the book. (This conversion is also often referred to as “hatting” and “unhatting,” I believe.)</p>
<p>The “vee” operator extracts a vector from a matrix [1]:</p>
\[\left[ \matrix{\hat{\omega} & v \cr 0 & 0} \right] ^ {V} = \left[ \matrix{v \cr \omega} \right]\]
<p>And conversely the “wedge” converts a vector to a matrix [1]:</p>
\[\left[ \matrix{v \cr \omega} \right]^{\hat{}} = \left[ \matrix{\hat{\omega} & v \cr 0 & 0} \right]\]
<h2 id="matrix-exponentials">Matrix Exponentials</h2>
<p>In MLS we often want to write the rotation of a point or vector as a function of the direction of rotation, \(\omega\), and some angle of rotation, \(\theta\) [1]. To do that, imagine we are going to compute the velocity of a point that is rotating, as follows [1]:</p>
\[\dot{q}(t) = \omega \times q(t) = \hat{\omega}q(t)\]
<p>This is a linear, time-invariant differential equation, and we can integrate to write [1]:</p>
\[q(t) = e^{\hat{\omega}t} q(0)\]
<p>(If you are familiar with the state transition matrix from control theory, this is a very similar idea.) Now we have \(e^{\hat{\omega}t}\) which is called a matrix exponential [1]. It allows us to write a function for the rotation, R, as being dependent on \(\omega\) and \(\theta\) by \(R(\omega, \theta) = e^{\hat{\omega}\theta}\). We can write the matrix exponential as an infinite series [1]:</p>
\[e^{\hat{\omega}t} = I + \hat{\omega}t + \frac{(\hat{\omega}t)^2}{2!} + \frac{(\hat{\omega}t)^3}{3!} + …\]
<p>But generally infinite series are not useful in a computational setting, so we need to find a closed-form representation of the matrix exponential. Moreover, MLS argues that in general it is useful to write a matrix exponential or a skew-symmetric matrix representation as the product of a <strong>unit</strong> skew-symmetric matrix and a real number [1]. So we can define \(\lvert \omega \rvert = 1\) and a real number \(\theta\) so that we can rewrite the expression above as:</p>
\[e^{\hat{\omega}\theta} = I + \theta \hat{\omega} + \frac{\theta^2}{2!} \hat{\omega}^2 + \frac{\theta^3}{3!} \hat{\omega}^3 + …\]
<p>From here, we can write an analytical expression for this version of the matrix exponential using <strong>Rodrigues’ formula</strong> [1]:</p>
\[e^{\hat{\omega}\theta} = I + \hat{\omega}sin \theta + \hat{\omega}^2(1 - cos \theta)\]
<h2 id="velocity">Velocity</h2>
<p>We can write the <strong>spatial velocity</strong> of a rigid body as follows [1]:</p>
\[\hat{V}^{s} = \dot{g}g^{-1}\]
<p>Where the spatial velocity is defined as the velocity of a rigid body as observed at the origin of the reference frame. We could also define the <strong>body velocity</strong> as [1]:</p>
\[\hat{V}^{b} = g^{-1}\dot{g}\]
<p>Where the body velocity is the velocity of the object as written in the instantaneous body frame.</p>
<h2 id="adjoint-transformations">Adjoint Transformations</h2>
<p>We use adjoint transformations to transform twists from one coordinate frame to another. We can use them with either velocities or wrenches [1]. In general, we can write an adjoint transformation as [1]:</p>
\[\text{Ad}_g = \left[ \matrix{R & \hat{p}R \cr 0 & R} \right]\]
<p>If I want to apply an adjoint transformation to a wrench, I would write [1]:</p>
\[F_c = Ad_{g_{bc}}^T F_b\]
<h2 id="conclusion">Conclusion</h2>
<p>I hope this post helps capture some key facts from Chapter 2 of MLS. I may write additional posts in the future on other topics from the book, or on some of the interesting mathematical tools used here. Thanks for reading!</p>
<h2 id="footnotes">Footnotes:</h2>
<p>*1 Hardcore nerds call the book “MLS” for short.</p>
<h2 id="references">References:</h2>
<p>[1] Murray, R., Li, Z., Sastry, S. “A Mathematical Introduction to Robotic Manipulation.” CRC Press. 1994. <a href="http://www.cse.lehigh.edu/~trink/Courses/RoboticsII/reading/murray-li-sastry-94-complete.pdf">http://www.cse.lehigh.edu/~trink/Courses/RoboticsII/reading/murray-li-sastry-94-complete.pdf</a> Visited 01 Jun 2021.</p>
<p>[2] Goodfellow, Ian. “LaTex in Jekyll.” 7 Nov, 2016. <a href="http://www.iangoodfellow.com/blog/jekyll/markdown/tex/2016/11/07/latex-in-markdown.html">http://www.iangoodfellow.com/blog/jekyll/markdown/tex/2016/11/07/latex-in-markdown.html</a> Visited 24 May 2021.</p>
<p>[3] Wikipedia. “Euclidean group.” <a href="https://en.wikipedia.org/wiki/Euclidean_group">https://en.wikipedia.org/wiki/Euclidean_group</a> Visited 01 Jun 2021.</p>
<p>[4] Wikipedia. “Euclidean distance.” <a href="https://en.wikipedia.org/wiki/Euclidean_distance">https://en.wikipedia.org/wiki/Euclidean_distance</a> Visited 01 Jun 2021.</p>I have been trying to learn some of the fundamentals of writing equations of motion for multilink robots from the classic text by Murray, Li and Sastry*1. There were some key ideas and mathematical notations that I wanted to record for myself as I continue to work through the text [1]. I also wanted to try out using MathJax to write equations instead of my former method. Many thanks to Ian Goodfellow’s blog post on the topic [2].Week of May 24 Paper Reading2021-05-29T00:00:00+00:002021-05-29T00:00:00+00:00http://sassafras13.github.io/weekMay24rdg<p>Last week I explored some ideas around motif-centric learning and then began to investigate the importance of equivariance and invariance in deep learning models. This week, I continue to explore why equivariance is important and useful. I also dive into the idea of an attention mechanism and transformers, ideas that I have heard of but never studied.</p>
<h2 id="paper-1-trajectory-prediction-using-equivariant-continuous-convolution-by-walters-et-al-1">Paper 1: Trajectory Prediction using Equivariant Continuous Convolution by Walters et al. [1]</h2>
<p>In this paper, Prof. Rose Yu’s group presents a novel model for predicting the trajectories of cars and pedestrians, called Equivariant Continuous Convolution (ECCO). Instead of using GNNs, they argue that they can leverage the symmetry in the trajectory data and learn to predict the trajectories more efficiently using a continuous convolution model. Specifically, they want their model to capture the fact that some aspects of the trajectory data should be the same regardless of how the data is rotated, like the cars’ velocities, while other aspects should be rotated along with the frame of reference, like the direction in which the car is turning at the intersection. Walters et al. present a very cool idea of using a torus kernel to capture this rotational symmetry in the data where it is present. After testing and comparing their model to the state of the art, Walters et al. conclude that their approach is more sample efficient because it trains faster and requires approximately 10X fewer parameters than other models [1].</p>
<p>My big takeaway from this paper is that if you can show that your model is equivariant with respect to the input data, it is more likely to generalize well. This is very useful because models that are able to generalize will provide better predictions in novel conditions, and they will be more sample efficient [1]. This helps me answer my question from last week which is: why is model equivariance useful?</p>
<h2 id="paper-2-attention-is-all-you-need-by-vaswani-et-al-2">Paper 2: Attention Is All You Need by Vaswani et al. [2]</h2>
<p>I think this is a pretty significant landmark paper that introduced the Transformer to the deep learning community in 2017. The central argument in this paper is that only the attention mechanism, commonly found in CNNs, RNNs and LSTMs, is necessary to complete sequence transduction modeling activities. The authors argue that the Transformer is a much simpler architecture than its competitors, and thus is much faster and cheaper to train while achieving comparable performance. They demonstrate their model’s efficacy with translation tasks [2].</p>
<p>I need to get a better handle on the mathematical underpinnings of attention in neural networks, but I think the fundamental idea is that attention is akin to taking the dot product between a query and the key of a key-value pair. This result is then scaled and passed through the softmax function: this becomes the weights applied to the values of the key-value pairs. The transformer still has an encoder and decoder (similar to the architecture of a CNN, for example) and we perform this attention operation over every input to the encoder and over every intermediate layer’s output from the encoder, and so on through the decoder (there is masking in some layers of the decoder to capture the sequential nature of the input data) [2].</p>
<p>I think that the overall idea is that by computing these dot products (i.e. projecting a vector onto different bases) we are finding the connections between every input value and every other input value. Ultimately we find the strong connections that help us complete the task at hand.</p>
<p>## References:</p>
<p>[1] Walters, R., Li, J., & Yu, R. (2020). Trajectory Prediction using Equivariant Continuous Convolution. <a href="http://arxiv.org/abs/2010.11344">http://arxiv.org/abs/2010.11344</a></p>
<p>[2] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.</p>Last week I explored some ideas around motif-centric learning and then began to investigate the importance of equivariance and invariance in deep learning models. This week, I continue to explore why equivariance is important and useful. I also dive into the idea of an attention mechanism and transformers, ideas that I have heard of but never studied.Week of May 17 Paper Reading2021-05-21T00:00:00+00:002021-05-21T00:00:00+00:00http://sassafras13.github.io/weekMay17rdg<p>With the start of summer, and the transition to the final two years of the PhD program (yikes!), I have been thinking a lot about what I want to focus on learning during these next two years. One thing I want to do is read papers more frequently so that I stay on top of what is happening in my field. I have never been very good at doing this but now that I have completed most of the requirements in the PhD program (classes, TA assignments, etc.) I have more time to dedicate to doing this well. I’m going to try to write a blog post every week summarizing the papers I have read in that time, and my target is to read 4-5 papers a week for right now. This is going to be my first attempt at recording what I’ve read and learned.</p>
<h2 id="paper-1-structure-motif-centric-learning-framework-for-inorganic-crystalline-systems-by-banjade-et-al-1">Paper 1: Structure motif-centric learning framework for inorganic crystalline systems by Banjade et al. [1]</h2>
<p>This paper presents a novel approach to learning to predict the material properties (like bandgap and formation energy) for a dataset of approximately 22,000 inorganic crystalline compounds (specifically, metal ions). First the paper shows that the structure of a “motif” of a crystalline structure (i.e. the base unit that is repeated to form the structure) contains key information that helps cluster the samples in the dataset. They do this using t-SNE. The results of this clustering do not seem to be used in the second part of the paper, so I am a little confused about the purpose of this section. Next, the authors present a dual-graph approach (i.e. two graphs, one to represent the motif itself on the atomic level, and one to represent how the motif assembles) that can learn to make predictions about the properties of the structures input to the model. The model is called AMDNet and has been shown to outperform other standard models used in the field [1].</p>
<p>I like this paper because it presents a framework for showing that the data you have collected on a structure is relevant for capturing important information about the structure. It also presents a possible model that I could adapt to my self-assembly system to make predictions about what structures it will form. I would like to read references 31 and 48 if I have time.</p>
<h2 id="paper-2-motif-driven-contrastive-learning-of-graph-representations-by-zhang-et-al-2">Paper 2: Motif-Driven Contrastive Learning of Graph Representations by Zhang et al. [2]</h2>
<p>This is reference 48 from paper 1 above (reference 31 turned out to be outside my research area). This is a more recent paper than [1] that builds on the idea that using graph motifs can be helpful in training graph neural networks (GNNs) to complete various tasks. Specifically in this paper, the authors present 3 key contributions. First, they argue that their approach to using domain-specific graph motifs (in this case, they are working with small molecules so a motif could be a Benzene ring, for instance) allows them to gain improvements in contrastive learning performance over other methods that operate at the node level or use randomly selected subgraphs. Secondly, they show that they can learn to embed a graph motif in a continuous representation, which allows for differentiation and learning via gradient descent. And finally, they show that after pre-training a GNN with these motifs via contrastive learning, the GNNs perform better than the state-of-the-art in chemical property prediction tasks [2].</p>
<p>This paper was interesting to me for two main reasons: first, I think the idea of leveraging motifs in the training process could be useful to my work too. And secondly, the paper does a good job of demonstrating how their novel contributions add to the performance gains of the model by using an ablation study [2]. I might try to use a similar approach in a future paper, too.</p>
<h2 id="paper-3-universal-invariant-and-equivariant-graph-neural-networks-by-keriven-and-peyre-3">Paper 3: Universal Invariant and Equivariant Graph Neural Networks by Keriven and Peyre [3]</h2>
<p>This week I attended a talk by the <a href="https://www.cmu.edu/aced/sciML.html">Scientific Machine Learning Webinar Series</a> at CMU about using GNNs to model fluid flow. One of the questions during the Q&A period mentioned the idea of invariant and equivariant GNNs, and this idea was entirely new to me. I found this paper while I was looking for answers, and it gave a good overview of the concept, as well as a detailed set of mathematical proofs [3].</p>
<p>The key idea here is that GNNs should be either <strong>invariant</strong> or <strong>equivariant</strong>. GNNs that are <strong>invariant</strong> to permutations in the nodes of the input graph will return the same output regardless of the order of input. For example, if I am providing a graph as input using an adjacency matrix, it should not matter what order the nodes are listed in the matrix - my GNN should return the same output value every time regardless of the node ordering. This property is applicable when we are using a GNN to process a graph as input and provide some scalar as output. One example of this could be using a GNN to predict the solubility of small molecules when their atomic structure is provided as input [3].</p>
<p>Similarly, GNNs are <strong>equivariant</strong> if the output changes to follow permutations in the input. This is applicable when the GNN is going to provide a graph as output. For instance, imagine that you want to take in a modular robot structure represented as a graph and return a graph representing commands that should be sent to each module (many thanks to my colleague Julian Whitman for this <a href="https://ml4eng.github.io/camera_readys/49.pdf">idea</a>). In this situation, if you change the order in which you are representing the modules on the robot, you want the GNN to understand there has been a change and modify the output commands so they are still associated with the appropriate module [3].</p>
<p>This paper provides a mathematical proof that certain GNNs are invariant or equivariant. Specifically, they prove “that a GNN described by a single set of parameters can approximate uniformly well a function that acts on graphs of varying size.” The authors explain that they are building upon a fundamental result for classical MLPs developed in the 1980s and 1990s, called the universal approximation theorem [3]. I want to read more about this next because I do not know anything about the significance of this result and it sounds important.</p>
<p>To be honest, I did not read through the mathematical details of the proof in great detail. I also got the impression during the webinars that these two properties have inspired different approaches to training or building GNNs, but I do not fully understand that idea yet. I heard that Prof. Rose Yu is thinking about the importance of invariance and equivariance in her research on GNNs, so perhaps I should start by reading some of her work to get a better understanding of the concepts’ significance.</p>
<h2 id="references">References:</h2>
<p>[1] Banjade, H. R., Hauri, S., Zhang, S., Ricci, F., Gong, W., Hautier, G., Vucetic, S., & Yan, Q. (2021). Structure motif–centric learning framework for inorganic crystalline systems. Science Advances, 7(17), eabf1754. <a href="https://doi.org/10.1126/sciadv.abf1754">https://doi.org/10.1126/sciadv.abf1754</a></p>
<p>[2] Zhang, S., Hu, Z., Subramonian, A., & Sun, Y. (n.d.). Motif-Driven Contrastive Learning of Graph Representations; Motif-Driven Contrastive Learning of Graph Representations. <a href="https://doi.org/10.1145/1122445.1122456">https://doi.org/10.1145/1122445.1122456</a></p>
<p>[3] Keriven, N., & Peyré, G. (n.d.). Universal Invariant and Equivariant Graph Neural Networks. <a href="https://arxiv.org/abs/1905.04943">arXiv:1905.04943</a>.</p>With the start of summer, and the transition to the final two years of the PhD program (yikes!), I have been thinking a lot about what I want to focus on learning during these next two years. One thing I want to do is read papers more frequently so that I stay on top of what is happening in my field. I have never been very good at doing this but now that I have completed most of the requirements in the PhD program (classes, TA assignments, etc.) I have more time to dedicate to doing this well. I’m going to try to write a blog post every week summarizing the papers I have read in that time, and my target is to read 4-5 papers a week for right now. This is going to be my first attempt at recording what I’ve read and learned.AFM Scanning Parameters and Image Artifacts2021-04-07T00:00:00+00:002021-04-07T00:00:00+00:00http://sassafras13.github.io/ImageArtifactsScanParams<p>I have been writing a series of posts on different aspects of atomic force microscopy, starting with this <a href="https://sassafras13.github.io/AFM/">post</a> describing the fundamental operating principles, and this <a href="https://sassafras13.github.io/ProbeSelection/">post</a> on how to choose a probe for your application. Right now, I want to talk about two related aspects of AFM imaging: how to optimize your scanning parameters, and what common image artifacts you should look out for while you’re imaging. Let’s get to it!</p>
<h2 id="optimizing-scanning-parameters">Optimizing Scanning Parameters</h2>
<p>I will start by introducing some of the parameters we commonly tune during AFM imaging, and then I will describe each in more detail below. This information is also summarized in Figure 1. The first parameter we typically control is the <strong>setpoint</strong>. This parameter defines how much interaction force the system should maintain between the probe tip and the sample surface. Next, we have the <strong>drive amplitude</strong>, which is the amplitude at which the cantilever and tip are oscillating in tapping mode. (I don’t believe I have access to this parameter in my setup, so I will only briefly touch on it here.) We also have control over the control system’s <strong>gains</strong>, which determines the sensitivity of the control loop. We can change the <strong>scan rate</strong> or <strong>scanning speed</strong> while we are imaging, which means we change how quickly the probe rasters over the sample surface. Finally, we can choose the <strong>resolution</strong> of our scan image to determine how many pixels we want in our image [1].</p>
<p><img src="/images/2021-04-07-ImageArtifactsScanParams-fig1.png" alt="Fig 1" title="Figure 1" /> <br />
Figure 1 - Source: [1]</p>
<h3 id="setpoint">Setpoint</h3>
<p>The setpoint tells the AFM control loop what value of force interaction the probe should maintain with the sample while scanning. I think the actual value that the setpoint is controlling is either the distance between the probe and the sample or the vibration amplitude that the system is maintaining [1]. If you choose a larger setpoint value, you are asking the system to hold the probe farther away from the sample surface, and therefore you want a lower amount of interaction force between the probe and the sample. <strong>If your image is too noisy, you should increase your setpoint.</strong> Conversely, bringing the setpoint down increases the interaction between the probe and the sample. <strong>Decrease the setpoint for better quality images</strong> [1].</p>
<h3 id="driven-amplitude">Driven Amplitude</h3>
<p>The driven amplitude is the amplitude of the oscillations applied by the AFM setup during tapping mode scanning. If we increase the drive amplitude, we can improve our phase signal data, to some extent [1].</p>
<h3 id="gain">Gain</h3>
<p>The gain amplifies the signal from the probe. Increasing the gain can help improve signal quality, but it will also amplify signal noise [1].</p>
<h3 id="scan-ratespeed">Scan Rate/Speed</h3>
<p>This determines how quickly the probe travels over the sample surface. Generally speaking, a slower scan speed will lead to better images, but the tradeoff is that each image will take more time to collect [1].</p>
<h3 id="resolution">Resolution</h3>
<p>A high resolution will lead to better image quality, but it will also greatly increase the amount of time it will take to collect that image [1].</p>
<p>Before we move on to discuss image artifacts, let’s talk about what order you should tune your parameters in:</p>
<ul>
<li>
<p>Generally speaking, the setpoint and the gain should be the two parameters you change first as you’re trying to optimize your images [1]. Some advice I was given while training on the AFM was that you should always increase your gain first, and decrease your setpoint second.</p>
</li>
<li>
<p>If your signal is very noisy, try decreasing the gain. If that does not work, try modifying the scan rate - it could be too fast <em>or</em> too slow so try playing around with it to see if it improves your image quality [1].</p>
</li>
<li>
<p>Check the trace and retrace curves as you are scanning. This <a href="https://www.nanoandmore.com/pdfs/how-to-optimize-AFM-scan-parameters.pdf">poster</a> is a great reference for what your trace and retrace curves for imaging speed, gain and setpoint should look like [2].</p>
</li>
</ul>
<h2 id="image-artifacts">Image Artifacts</h2>
<p>Now that we have some idea of what our scanning parameters are and how we can tune them, let’s talk about some of the image artifacts we commonly see and what clues they give us about how we can improve our scanning parameters. Typically, image artifacts can be broken down into the following categories: <strong>probe</strong> artifacts, <strong>scanner</strong> artifacts, <strong>image processing</strong> artifacts, <strong>noise</strong> and <strong>process</strong> artifacts [3]. Note that an artifact is any feature in an image that shouldn’t be there (i.e. that does not really exist on the sample surface) [3].</p>
<h3 id="probe-artifacts">Probe Artifacts</h3>
<p>The probe tip is the part of the AFM setup that is directly interacting with the sample surface, so if it is damaged in any way, this can lead to image artifacts. Chipped probes can make everything look like it’s the same size, and triangular, because the tip is being smashed into the sample surface or is being dragged along the surface too quickly [3]. The tip can also pick up dirt or particulate from the sample, which can add to the chip geometry, or create a secondary tip, which can make the image look like you are seeing double features (i.e. as though you had crossed your eyes while looking at the image) [3].</p>
<h3 id="scanner-artifacts">Scanner Artifacts</h3>
<p>The piezoelectric stage is responsible for moving the sample relative to the probe, so it can also contribute artifacts if it is not operating properly. The stage will always suffer from some hysteresis and creep, which can add artifacts to an image. You should be able to identify this by scanning a calibration sample and measuring the dimensions of a reference feature and determining how far off your measurements are [3]. Hysteresis especially can create “odd” images. Also, be especially careful of hysteresis if you are scanning in an extreme region in x, y or z (i.e. near the maximum or minimum of their ranges). You are more likely to see errors due to poor calibration or hysteresis in these regions far from the center of the operating envelope [3]. If the background appears curved, this can also be caused by hysteresis of the stage, or if the sample was not set up on a level surface [3].</p>
<h3 id="image-processing-artifacts">Image Processing Artifacts</h3>
<p>There are a couple of things that you can do in the post-processing step that can negatively impact your images. For example, if you do line leveling, this can create bands in the image. You should use a mask to exclude features to remove these bands while doing line leveling. Filtering can also change the feature dimensions (especially) in z, so be careful of reporting incorrect dimensions if you have used filtering. Filtering (such as Fourier filters) can also add nodules that contain a lot of noise to an image [3].</p>
<p>It is good practice to collect z-height measurements from a histogram of multiple measurements instead of taking the measurement from a single line profile. Also, be wary of images that have no noise at all - this is more likely to indicate that something is wrong than that you are a genius AFM tuner [3].</p>
<h3 id="sources-of-noise">Sources of Noise</h3>
<p>Some common sources of noise in your image can come from ambient vibrations (construction, vehicles outside, etc.) or sample contamination. Electronic noise from the machine setup can also interfere with your images - try moving the scan or driven vibration frequencies to eliminate this noise.</p>
<h3 id="process-artifacts">Process Artifacts</h3>
<p>If you are scanning too quickly, then you may not give the probe enough time to smoothly travel over sample surfaces and accurately measure them. If the features look odd or misshapen, try slowing down the scan speed. If the features that you are measuring are too large, this is also a good indication that you need to alter the setpoint and scan speed. You can also try scanning in different directions - if a particular feature exists in images taken while scanning in two different directions, it is more likely to be real than if it only shows up when scanning in a particular direction. ANd finally, if you see low frequency waves in the image, this can be caused by light reflecting off the sample surface, which means that you have not properly centered the laser on the probe cantilever [3].</p>
<p>Alright, I think that is about all I have to say about AFM imaging for today. Thanks for reading!</p>
<h2 id="references">References:</h2>
<p>[1] “Step-by-step instruction for using the Digital Instruments Multimode AFM.” <a href="http://mcf.tamu.edu/wp-content/uploads/2016/09/MultiMode-AFM-Instructions_Sep2010.pdf">http://mcf.tamu.edu/wp-content/uploads/2016/09/MultiMode-AFM-Instructions_Sep2010.pdf</a> Visited 07 Apr 2021.</p>
<p>[2] “Choose your AFM settings properly!” NanoAndMore. <a href="https://www.nanoandmore.com/pdfs/how-to-optimize-AFM-scan-parameters.pdf">https://www.nanoandmore.com/pdfs/how-to-optimize-AFM-scan-parameters.pdf</a> Visited 07 Apr 2021.</p>
<p>[3] “Artifacts in AFM Images.” AFM Workshop seminar. <a href="https://www.afmworkshop.com/learning-center/afm-webinars">https://www.afmworkshop.com/learning-center/afm-webinars</a> Visited 07 Apr 2021.</p>I have been writing a series of posts on different aspects of atomic force microscopy, starting with this post describing the fundamental operating principles, and this post on how to choose a probe for your application. Right now, I want to talk about two related aspects of AFM imaging: how to optimize your scanning parameters, and what common image artifacts you should look out for while you’re imaging. Let’s get to it!A Brief Guide to AFM Probe Selection2021-04-07T00:00:00+00:002021-04-07T00:00:00+00:00http://sassafras13.github.io/ProbeSelection<p>Recently I wrote a <a href="https://sassafras13.github.io/AFM/">post</a> describing the fundamental operating principles for atomic force microscopy (AFM). Today, I would like to follow up with a more focused discussion about the different kinds of AFM probes and how to select one for your application. I will start by talking about probe design in general, and then walk us through some different parameters that we need to consider when selecting probes, and how they affect the probe’s performance. Let’s get started!</p>
<h2 id="overall-probe-design">Overall Probe Design</h2>
<p>The AFM probe is the part of the AFM setup that interacts directly with the sample surface. It is made up of three primary components: the chip (substrate), the cantilever that extends from the chip, and the tip which is on the end of the cantilever and interacts with the sample surface [1]. We manipulate the probe during set up by handling the chip, since this is the largest and most robust part of the probe. The cantilever is much smaller and extends from the end of the chip. In dynamic scanning, the cantilever is usually vibrating so we need to choose cantilever parameters that allow it to vibrate at a desired range of frequencies. Finally, the tip is the smallest, most precise part of the probe because it is the component that will interact directly with the sample. The tip is usually deposited on the end of the cantilever using a special process [1].</p>
<p><img src="/images/2021-01-29-AFM-fig2.png" alt="Fig 1" title="Figure 1" /> <br />
Figure 1 - Source: [2]</p>
<p>The probe is typically made using anisotropic silicon etching to a wide range of specifications. The typical lifetime of the probe is usually determined by how long the tip remains intact, and this can depend on many factors, including how aggressively the probe is used (i.e. how smoothly the tip approach is conducted, how fast scanning is done, etc.) and how experienced the user is [1]. As a noob I break a lot of my probes before I even place them in the holder; that’s how budget probe manufacturers stay in business.</p>
<p>Let me give you some more details on the cantilever and tip designs, since the design of these components determines the probe’s overall performance. The cantilever establishes the interaction force between the probe and the sample, and the corresponding parameter for this is the probe’s <strong>force constant</strong>. Stiffer cantilever beams have larger force constants. As I mentioned earlier, in dynamic scanning modes the cantilever is designed to vibrate, and so manufacturers will publish the <strong>resonant frequency</strong> of the cantilever as well. Finally, the <strong>coating</strong> of the cantilever is important to consider because the laser beam is incident to the top surface of the cantilever during scanning, and we want to ensure that we reflect that laser beam accurately towards the photodiode. Most coatings are optimized to be highly reflective, and the specific coating for a given probe actually depends mostly on the cantilever substrate material since certain coatings can only be applied to certain materials [1].</p>
<p>The tip itself is the most precise and delicate component of an AFM probe. The <strong>tip radius</strong> is an important factor to consider because this determines the smallest features you will be able to measure. You also want to choose an appropriate tip <strong>shape</strong> and <strong>aspect ratio</strong> - there are some specialized tip geometries for scanning specific features that can be difficult to access with a more standard design, such as grooves and other intricate surface geometries. You can also choose to purchase tips with specialized <strong>coatings</strong> which can help prolong their life span.</p>
<p>Now that we have given an overview of the parameters we need to consider when selecting a probe, let’s talk about each of them individually.</p>
<h2 id="force-constant">Force Constant</h2>
<p>The force constant defines the stiffness of the probe cantilever. This is particularly important in tapping mode, where we want stable oscillations while scanning. Stable oscillations give a cleaner signal, and we can obtain them by ensuring the probe has enough stiffness (and therefore stored energy) to overcome the adhesive and capillary interaction forces between the tip and the sample surface. This means that the sample itself should also be a determining factor in your choice of force constant: if you have low adhesion/interaction forces with your particular surface, then you should choose a probe that is less stiff. This will allow the probe to still be sensitive enough to detect changes in the sample surface. Put another way, choosing a cantilever beam that is more flexible will allow it to bend to follow the surface even if the surface is not pulling on the cantilever with a lot of force.</p>
<p>I will primarily be scanning soft DNA samples, and for these kinds of studies it is generally recommended to choose probes with stiffnesses that are comparable to the stiffness of the sample itself, or slightly stiffer [3]. If the sample is sticky, it will be helpful to have a slightly stiffer probe that can break away from the sticky surface while vibrating in tapping mode [3]. In general, probe stiffnesses can range from <1N/m to >50N/m.</p>
<h2 id="resonant-frequency">Resonant Frequency</h2>
<p>In tapping mode AFM, the AFM control loop oscillates the probe at (approximately) its resonant frequency. So if you are choosing between two probes that have all the same parameters except for their resonant frequencies, you should typically choose the probe with the higher resonant frequency. The reason for this is that you want the tapping frequency of the probe to be as far away as possible from the scanning frequency (as determined by your scanning speed) [3]. The sample is going to respond to both the probe’s tapping frequency and scanning frequency, but if these two frequencies are far apart it is easier to isolate the sample’s response to the scanning frequency (which is much lower) from its response to the tapping frequency [3]. Also, generally samples experience less damage when probed at higher tapping frequencies. Generally resonant frequencies >300kHz are recommended.</p>
<h2 id="cantilever-coating">Cantilever Coating</h2>
<p>I’m not big on materials science so I won’t go into a lot of detail here. I think the general wisdom in regards to probe coatings is that if it’s included in the probe design you should be willing to pay for it. Gold and/or chromium coatings are pretty common and inexpensive [2].</p>
<h2 id="tip-radius">Tip Radius</h2>
<p>As I explained in my previous post, the probe tip radius affects the resolution of your scans in the x-y plane (the resolution in the z-direction is controlled by the piezoelectric stage that moves the sample). Sharper probe tips allow you to detect smaller features on your sample, and they also reduce the amount of error in your lateral measurements. Generally you should choose a tip radius that is smaller than the smallest features you want to measure [1]. Keep in mind that super sharp tips (i.e. <2nm) can significantly increase the cost of the probe. (Often, though, manufacturers will provide free samples that can help you find a probe that meets your needs without breaking the bank.)</p>
<p>One thing I’d like to point out here is that the tip is so small that it can essentially be obliterated if it experiences an <strong>electrostatic discharge</strong>. In other words, if you just walked across a carpet to get to the AFM machine and then immediately picked up a probe with your forceps, you could accidentally shock your probe tip into oblivion. One way to avoid this is <em>not</em> to shuffle across the carpet, but a more robust solution is to wear an ESD bracelet while handling the probes. You can get these for cheap (less than the cost of a probe, that’s for sure) from Fry’s Electronics*1 or Micro Center - they’re commonly marketed to folks who like to build their own computers.</p>
<h2 id="tip-shape-aspect-ratio-and-coating">Tip Shape, Aspect Ratio and Coating</h2>
<p>I have not spent as much time considering these parameters so I will be more brief here. If you choose a sharp tip with a large aspect ratio (i.e. it is tall and skinny instead of being short and squat) then you can probe deeper into indentations and crevices in your sample, which may be useful [3]. However, if you want to study the forces exerted by your sample, then a probe with a larger surface area, or a shape that has been modeled analytically, may be more advantageous [3].</p>
<p>The tip coating contributes to its lifespan. Often you can find tips with diamond or “diamond-like” coatings that are supposed to enhance their hardness and longevity.</p>
<h2 id="q-factor">Q Factor</h2>
<p>One final parameter I want to discuss is called the Q factor, or <strong>quality factor</strong>. This is a dimensionless parameter that describes how underdamped an oscillator is [4]. Specifically, it is the ratio of energy stored to energy lost in one radian of a cycle of oscillation [4]. Therefore, a large Q factor indicates that the oscillator does not lose much energy during vibration, and that the oscillations die off more slowly - i.e. a large Q factor indicates that the oscillator does not experience much damping [4]. In general, a high Q factor is desirable because this indicates that the probe will be more sensitive to the sample profile. Rectangular cantilevers generally have higher Q factors than triangular cantilevers. This parameter is not always reported by the manufacturer.</p>
<p>That concludes my discussion about probe selection. I plan to write another post on image artifacts commonly seen during AFM imaging, and about how to choose good scanning parameters, so stay tuned!</p>
<h2 id="footnotes">Footnotes:</h2>
<p>*1 Wait, sorry, they’re out of business. RIP Fry’s, I will miss walking down your aisles admiring computer towers I can’t afford and don’t need.</p>
<h2 id="references">References:</h2>
<p>[1] “AFM Probe Selection.” AFM Workshop seminar. <a href="https://www.afmworkshop.com/learning-center/afm-webinars">https://www.afmworkshop.com/learning-center/afm-webinars</a> Visited 07 Apr 2021.</p>
<p>[2] “What is Atomic Force Microscopy (AFM).” NanoAndMore USA. <a href="https://www.nanoandmore.com/what-is-atomic-force-microscopy">https://www.nanoandmore.com/what-is-atomic-force-microscopy</a> Visited 05 Apr 2021.</p>
<p>[3] Gavara, N. (2017). A beginner’s guide to atomic force microscopy probing for cell mechanics. In Microscopy Research and Technique (Vol. 80, Issue 1, pp. 75–84). Wiley-Liss Inc. https://doi.org/10.1002/jemt.22776</p>
<p>[4] “Q factor.” Wikipedia. <a href="https://en.wikipedia.org/wiki/Q_factor">https://en.wikipedia.org/wiki/Q_factor</a> Visited 07 Apr 2021.</p>Recently I wrote a post describing the fundamental operating principles for atomic force microscopy (AFM). Today, I would like to follow up with a more focused discussion about the different kinds of AFM probes and how to select one for your application. I will start by talking about probe design in general, and then walk us through some different parameters that we need to consider when selecting probes, and how they affect the probe’s performance. Let’s get started!