Jekyll2022-05-11T21:44:43+00:00http://sassafras13.github.io/feed.xmlEmma BenjaminsonMechanical Engineering Graduate StudentThermodynamics and Reaction Kinetics, Revisited2022-05-11T00:00:00+00:002022-05-11T00:00:00+00:00http://sassafras13.github.io/thermo-reaction<p>It has been almost 10 years since I last took a thermodynamics course and so for recent research investigations, I have gone back to brush up on the basics. This post is mostly going to be a thermodynamics and rate kinetics cheat sheet. At a high level, I wanted to understand thermodynamics and rate kinetics in the context of DNA nanotechnology, because these concepts give important insight into how DNA nanostructures form. The difference between thermodynamics and rate kinetics is that thermodynamics describes what direction a reaction will take, while rate kinetics provides information on how quickly that reaction will occur . We’ll start by discussing thermodynamics and then look at where rate kinetics fill in the gaps.</p> <h2 id="thermodynamics">Thermodynamics</h2> <p>I’ll focus my discussion here on classical thermodynamics because that is the branch that is commonly used to analyze DNA nanotechnology, but it is worth noting that there are also other branches such as statistical mechanics and non-equilibrium thermodynamics. Classical thermodynamics refers to the branch that studies processes that are nearly at equilibrium and have bulk properties that we can measure, which are appropriate assumptions for DNA nanotechnology .</p> <p>In classical thermodynamics, we have some basic concepts that we apply to physical systems that we study. For example, we typically consider <strong>systems</strong> as the physical material of interest, and their <strong>surroundings</strong> as the environment around the material that is in contact with it. A system will have properties that we can update over time via equations, and we can combine measurements of these properties to define thermodynamic characteristics of the system like internal energy and thermodynamic potentials. We note that <strong>equilibrium</strong> in this context means that there can be small, random exchanges of energy between systems or between components in the system, but there is no net change in energy due to these exchanges. Classical thermodynamics focuses on near-equilibrium situations because these are easier to study and understand than non-equilibrium situations .</p> <p>Thermodynamic properties can be either <strong>intensive</strong> or <strong>extensive</strong>. Intensive properties are quantities that have magnitude independent of the system size - for example, density, temperature and refractive are all properties that are independent of the amount of material in the system. Conversely, extensive properties have a magnitude that varies with system size, such as mass, volume or entropy .</p> <h3 id="laws">Laws</h3> <p>At the center of classical thermodynamics are four laws as follows :</p> <p><strong>Zeroth Law:</strong> “If two systems are each in thermal equilibrium with a third, they are also in thermal equilibrium with each other.” This law essentially is a definition of temperature.</p> <p><strong>First Law:</strong> The change in internal energy is equal to the energy gained as heat, minus the thermodynamic work done on the surroundings. In other words :</p> $\Delta U = Q - W$ <p>Where $$\Delta U$$ is a change in internal energy, $$Q$$ is the heat energy input and $$W$$ is the thermodynamic work that is done. This law indicates that we can transfer energy between systems as either heat, work or mass .</p> <p><strong>Second Law:</strong> “Heat does not spontaneously flow from a colder body to a hotter.” This law is defining entropy, because it means that a more ordered system (colder, fewer possible states) cannot become a less ordered system (hotter, more possible states). The entropy in the universe is always increasing, not decreasing. Moreover, entropy is maximized at thermal equilibrium (more on this later) .</p> <p><strong>Third Law:</strong> This law defines the existence of absolute zero .</p> <h3 id="conjugate-variables">Conjugate Variables</h3> <p>In thermodynamics, we have many sets of <strong>conjugate</strong> variables, that when multiplied together they can describe the energy transferred from one system to another. One way to think about them is to relate them to a “force” and a “displacement” which, when multiplied together, compute work done. Some examples include :</p> <ul> <li>Pressure and volume</li> <li>Temperature and entropy</li> <li>Chemical potential and particle number</li> </ul> <p>Now let’s look at some specific concepts that we introduced above.</p> <h3 id="gibbs-free-energy">Gibbs Free Energy</h3> <p>We frequently use Gibbs free energy in DNA nanotechnology - it is a measure of the maximum work that a closed system can do at constant pressure and temperature. Another way to describe this is to say it is the maximum amount of work that does not cause expansion, i.e. we can exchange heat and work but not mass (because that is the definition of a closed system). We often use it to describe the hybridization reaction between two complementary strands of ssDNA. We can write the Gibbs free energy as :</p> $\Delta G = \Delta H - T \Delta S$ <p>Where $$\Delta G$$ is the change in Gibbs free energy, $$\Delta H$$ is the enthalpy of the system, $$T$$ is the absolute temperature and $$\Delta S$$ is the entropy of the system. We measure the Gibbs free energy in Joules .</p> <p>The maximum Gibbs free energy for a process can only be attained if that process is completely reversible. In general, we say that the Gibbs free energy must be less than the non-pressure-volume work (which is typically 0) to be energetically favorable. This means that reactions which have a negative Gibbs free energy are typically energetically favorable, while positive Gibbs free energy values indicate that energy must be added to the reaction for it to occur. So in this way, we can think of the Gibbs free energy as the useful energy available in the system .</p> <h3 id="enthalpy">Enthalpy</h3> <p>We just encountered enthalpy when we defined the Gibbs free energy, so what is it? Enthalpy describes the energy contained in a chemical system. Strictly speaking, it is defined as the sum of a system’s internal energy and the product of pressure and volume, as shown below, but typically the product of pressure and volume is small relative to the internal energy :</p> $H = U + pV$ <p>Where $$H$$ is the enthalpy, $$U$$ is the internal energy, $$p$$ is the pressure and $$V$$ is the volume of the system. The enthalpy is only dependent on the final state of the system, not on the path taken to arrive at that final state .</p> <h3 id="entropy">Entropy</h3> <p>We also touched on entropy earlier, but to address it here, it is used to convey whether some processes are irreversible or entirely impossible. It is a measure of the disorder in a system, or, put another way, the number of available states for the system in its current configuration (more states is equivalent to more disorder). Entropy always increases, and thermodynamic entropy is <strong>not</strong> conserved .</p> <h2 id="rate-kinetics">Rate Kinetics</h2> <p>So now we have seen that thermodynamics has many concepts that define how a reaction occurs and whether or not it will occur. But in the discussion above we never encountered a concept that could provide insight into how quickly a thermodynamically possible reaction would take place. This is where chemical kinetics, or rate kinetics, comes into play . In DNA nanotechnology, we often use fluorescent signals during the annealing process to tell us how quickly a structure is forming, i.e. to extract kinetic rates of a formation process.</p> <p>Typically, the rate at which a reaction occurs will depend on one or more of the following :</p> <ul> <li>The reactants involved</li> <li>The physical state of the reactants (i.e. solid, liquid, gas)</li> <li>The surface area (if the reactants are solid state)</li> <li>Concentration of reactants</li> <li>Temperature at which the reaction takes place</li> <li>The presence of catalysts</li> <li>Pressure</li> <li>Light absorption</li> </ul> <p>The reaction rate can be measured several different ways, but one common approach (aside from the approach of measuring fluorescent signals I described above, which is specific to DNA nanotechnology) is to measure how the concentrations of the reactants change over time. We say that <strong>dynamic equilibrium</strong> has been reached when a reversible reaction has equal forward and backward rates, so that, although the reaction is continuing to take place, there is no change in the concentrations of the reactants and the products. The Gibbs free energy can often be used to predict the reaction rates, in fact .</p> <h2 id="conclusion">Conclusion</h2> <p>Now that I’ve done a whirlwind tour of some of the key concepts of thermodynamics, I’m going to start diving more deeply into topics specific to DNA nanotechnology, beginning with my next post on cooperativity. Stay tuned!</p> <h2 id="references">References:</h2> <p> “Chemical kinetics.” Wikipedia. <a href="https://en.wikipedia.org/wiki/Chemical_kinetics">https://en.wikipedia.org/wiki/Chemical_kinetics</a> Visited 10 May 2022.</p> <p> “Thermodynamics.” Wikipedia. <a href="https://en.wikipedia.org/wiki/Thermodynamics">https://en.wikipedia.org/wiki/Thermodynamics</a> Visited 11 May 2022.</p> <p> “Intensive and extensive properties.” Wikipedia. <a href="https://en.wikipedia.org/wiki/Intensive_and_extensive_properties">https://en.wikipedia.org/wiki/Intensive_and_extensive_properties</a> Visited 10 May 2022.</p> <p> “Gibbs free energy.” Wikipedia. <a href="https://en.wikipedia.org/wiki/Gibbs_free_energy">https://en.wikipedia.org/wiki/Gibbs_free_energy</a> Visited 10 May 2022.</p> <p> “Enthalpy.” Wikipedia. <a href="https://en.wikipedia.org/wiki/Enthalpy">https://en.wikipedia.org/wiki/Enthalpy</a> Visited 11 May 2022.</p> <p> “Entropy.” Wikipedia. <a href="https://en.wikipedia.org/wiki/Entropy">https://en.wikipedia.org/wiki/Entropy</a> Visited 11 May 2022.</p> <p> “Chemical kinetics.” Wikipedia. <a href="https://en.wikipedia.org/wiki/Chemical_kinetics">https://en.wikipedia.org/wiki/Chemical_kinetics</a> Visited 10 May 2022.</p>It has been almost 10 years since I last took a thermodynamics course and so for recent research investigations, I have gone back to brush up on the basics. This post is mostly going to be a thermodynamics and rate kinetics cheat sheet. At a high level, I wanted to understand thermodynamics and rate kinetics in the context of DNA nanotechnology, because these concepts give important insight into how DNA nanostructures form. The difference between thermodynamics and rate kinetics is that thermodynamics describes what direction a reaction will take, while rate kinetics provides information on how quickly that reaction will occur . We’ll start by discussing thermodynamics and then look at where rate kinetics fill in the gaps.Recursion(recursion)2022-01-05T00:00:00+00:002022-01-05T00:00:00+00:00http://sassafras13.github.io/recursion<p>I recently wrote a blog post about the backtracking approach to solving problems in computer science. Backtracking requires the use of <strong>recursion</strong>, which is another topic that I don’t know much about, so I’m going to use this post to explore the basics of recursion. We will talk about what it is and some basic examples of how to use it to solve problems.</p> <p>The basic idea behind recursion is that we break a problem into smaller subproblems many times until the subproblem is so simple that we can directly solve it . A simple example would be using recursion to solve the factorial problem, i.e. to calculate 5! :</p> <p>$$5! = 5 \cdot 4!$$ – we write 5! as a product of 5 and 4!. Computing 4! is a <strong>subproblem</strong> of solving 5! <br /> $$4! = 4 \cdot 3!$$<br /> $$3! = 3 \cdot 2!$$<br /> $$2! = 2 \cdot 1!$$ $$1! = 1 \cdot 0!$$</p> <p>We know that 0! is simply defined as the multiplication identity, i.e. 1, so now we have a subproblem that we can solve easily. We then work our way back up to compute 5! as follows :</p> <p>$$1! = 1 \cdot 1 = 1$$<br /> $$2! = 2 \cdot 1 = 2$$<br /> $$3! = 3 \cdot 2 = 6$$<br /> $$4! = 4 \cdot 6 = 24$$<br /> $$5! = 5 \cdot 24 = 120$$</p> <p>The solution 0! = 1 is called the <strong>base case</strong> - this is the first case of the problem for which we know the answer by definition. The next case up in the recursion, 1!, is called the <strong>recursive case</strong> . Notice that, in order for recursion to work, you must not only be able to break the problem into subproblems, but doing so must eventually lead to a base case . There are certain instances where this would not happen (consider computing the factorial of a negative number), where we would never reach a base case, and in these situations, recursion cannot be used .</p> <p>In the example used above, we solved one subproblem at each level of the recursive process. But it is possible to have multiple subproblems at one level - for example, when drawing a Sierpinski triangle, each level of the recursion requires splitting the 3 black triangles into 4 smaller triangles , as shown in Figure 1. For each level of the recursive algorithm, we are solving 3 subproblems instead of one.</p> <p><img src="/images/2022-01-05-Recursion-fig1.png" alt="Fig 1" title="Figure 1" /> <br /> Figure 1 - Source </p> <h2 id="efficiency-of-recursive-algorithms">Efficiency of Recursive Algorithms</h2> <p>Recursive algorithms are not very efficient in terms of time. To see this in practice, let’s imagine we call a function to compute the factorial of 3 and 5. Within the function call <code class="language-plaintext highlighter-rouge">factorial(3)</code>, we must call <code class="language-plaintext highlighter-rouge">factorial(2)</code> and <code class="language-plaintext highlighter-rouge">factorial(1)</code>. For <code class="language-plaintext highlighter-rouge">factorial(5)</code>, we have to call the factorial function 4 times, on {1, 2, 3, 4}. But we already have the values of the factorial function applied to {1, 2, 3} when we computed <code class="language-plaintext highlighter-rouge">factorial(3)</code>, and it is time inefficient to repeat these computations .</p> <p>One way to improve the efficiency of a recursive function is to use a technique called <strong>memoization</strong>. In this technique, we save the distinct outputs of our function calls in a table (also called the “memo” for memoization). So when we computed <code class="language-plaintext highlighter-rouge">factorial(3)</code>, we would have saved the values for the factorial function applied to {1, 2, 3}. Then when we call <code class="language-plaintext highlighter-rouge">factorial(5)</code>, we can first check our table to see if we already have the solutions to some of our subproblems and reuse them so that we speed up computational time. This technique is not necessarily space efficient, but as long as the memory required to store the table of values is not too expensive, it might be a useful way to speed up the algorithm in certain situations .</p> <p>A second technique for speeding up recursion is to not use recursion at all, and replace it with a <strong>bottom-up approach</strong>. With this approach, we could compute Fibonacci numbers, <code class="language-plaintext highlighter-rouge">fibonacci(n)</code>, as follows :</p> <ul> <li>If <code class="language-plaintext highlighter-rouge">n = 0</code> or <code class="language-plaintext highlighter-rouge">n=1</code>, return <code class="language-plaintext highlighter-rouge">n</code></li> <li>Else repeat for n-2 times: <ul> <li>Initialize <code class="language-plaintext highlighter-rouge">twoBehind = 0</code> and <code class="language-plaintext highlighter-rouge">oneBehind = 1</code></li> <li>Initialize <code class="language-plaintext highlighter-rouge">result = 0</code></li> <li>Compute <code class="language-plaintext highlighter-rouge">result = twoBehind + oneBehind</code></li> <li>Update <code class="language-plaintext highlighter-rouge">twoBehind = oneBehind</code> and oneBehind = result`</li> <li>Return <code class="language-plaintext highlighter-rouge">result</code></li> </ul> </li> </ul> <p>This approach requires performing far fewer function calls than a top-down, recursive approach and also requires much less memory . Since this is obviously not a recursive algorithm, we call it instead an example of <strong>dynamic programming</strong>, which I plan to write about in more detail in a separate blog post .</p> <p>As we’ve seen in the previous paragraphs, recursion can be a useful and elegant way to solve a problem, but it also has its drawbacks. Recursion is excellent for traversing <a href="https://sassafras13.github.io/BinaryTrees/">trees</a>, for example . But, as discussed, recursion can be time (and space) inefficient. It can also be a confusing way to solve a problem, which you may want to avoid if your code is intended to be used by others .</p> <p>Thank you for reading this post, I hope it was helpful. Next time we will talk about dynamic programming in more detail!</p> <h2 id="references">References</h2> <p> “Recursion.” Khan Academy. <a href="https://www.khanacademy.org/computing/computer-science/algorithms/recursive-algorithms/a/recursion">https://www.khanacademy.org/computing/computer-science/algorithms/recursive-algorithms/a/recursion</a> Visited 6 Jan 2022.</p> <p> “Recursive factorial.” Khan Academy. <a href="https://www.khanacademy.org/computing/computer-science/algorithms/recursive-algorithms/a/recursive-factorial">https://www.khanacademy.org/computing/computer-science/algorithms/recursive-algorithms/a/recursive-factorial</a> Visited 6 Jan 2022.</p> <p> “Properties of recursive algorithms.” Khan Academy. <a href="https://www.khanacademy.org/computing/computer-science/algorithms/recursive-algorithms/a/properties-of-recursive-algorithms">https://www.khanacademy.org/computing/computer-science/algorithms/recursive-algorithms/a/properties-of-recursive-algorithms</a> Visited 6 Jan 2022.</p> <p> “Multiple recursion with the Sierpinski gasket.” Khan Academy. <a href="https://www.khanacademy.org/computing/computer-science/algorithms/recursive-algorithms/a/the-sierpinksi-gasket">https://www.khanacademy.org/computing/computer-science/algorithms/recursive-algorithms/a/the-sierpinksi-gasket</a> Visited 6 Jan 2022.</p> <p> Qualman, D. “Sierpinski gasket or triangle fractal collapse.” <a href="https://www.darrinqualman.com/fractal-collapse/sierpinski-gasket-or-triangle-fractal-collapse/">https://www.darrinqualman.com/fractal-collapse/sierpinski-gasket-or-triangle-fractal-collapse/</a> Visited 6 Jan 2022.</p> <p> “Improving efficiency of recursive functions.” Khan Academy. <a href="https://www.khanacademy.org/computing/computer-science/algorithms/recursive-algorithms/a/improving-efficiency-of-recursive-functions">https://www.khanacademy.org/computing/computer-science/algorithms/recursive-algorithms/a/improving-efficiency-of-recursive-functions</a> Visited 11 Jan 2022.</p> <p> Sturtz, J. “Recursion in Python: An Introduction.” Real Python. 10 May 2021. <a href="https://realpython.com/python-recursion/">https://realpython.com/python-recursion/</a> Visited 11 Jan 2022.</p>I recently wrote a blog post about the backtracking approach to solving problems in computer science. Backtracking requires the use of recursion, which is another topic that I don’t know much about, so I’m going to use this post to explore the basics of recursion. We will talk about what it is and some basic examples of how to use it to solve problems.Backtracking Algorithm2022-01-04T00:00:00+00:002022-01-04T00:00:00+00:00http://sassafras13.github.io/Backtracking<p>I have been working through Leetcode’s <a href="https://leetcode.com/study-plan/algorithm/">Algorithm I study plan</a> and recently came across the <strong>backtracking</strong> set of problems (days 10 and 11) . I wanted to give a brief explanation of what backtracking is, and how it uses other CS tools like tree graphs and <a href="https://sassafras13.github.io/DFS-BFS/">depth-first search</a>.</p> <p>The basic idea behind backtracking is that we want to find all possible solutions to a problem using brute force, and then discard all the solutions that don’t meet certain criteria established by the problem definition . As you might imagine, this is not a very efficient algorithm so do not use it if you need to meet time or memory constraints . That being said, let’s take a closer look at how the algorithm works on a toy example.</p> <p>Let’s find all the possible permutations of the numbers {1, 2, 3} where 1 and 3 are not adjacent. First, we need to find all of the possible solutions to this problem (we’ll eliminate the ones that don’t meet the adjacency constraint in a second step). We can do this using a tree to represent the <strong>state space</strong>, i.e. all the possible solutions . This tree is shown in Figure 1.</p> <p><img src="/images/2022-01-04-Backtracking-fig1.png" alt="Fig 1" title="Figure 1" /> <br /> Figure 1</p> <p>Every branch of this tree represents one of the possible solutions to this problem (i.e. a unique state). We can use depth-first search (DFS) to list out all the possible solutions: [1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]. (Notice that there are 6 solutions and 6 branches in the tree.) But what do we do about the fact that some of these solutions place 1 and 3 next to each other? We could add a check in our DFS algorithm such that if the constraint is violated (i.e. we find [1, 3 …]), then we stop searching along this branch and return to the root node to try the next branch instead . This method would leave us with the 2 viable solutions: [1, 2, 3] and [3, 2, 1].</p> <p>We can use backtracking to solve several kinds of problems, including <strong>decision problems</strong> (where we are trying to answer a yes/no question ), <strong>optimization problems</strong> (where we are trying to find the <em>best</em> answer to a problem with multiple solutions ) or <strong>enumeration problems</strong> like the one above where we are finding a set of permutations/solutions to a problem .</p> <h2 id="references">References</h2> <p> “Algorithm I Study Plan.” Leetcode. <a href="https://leetcode.com/study-plan/algorithm/">https://leetcode.com/study-plan/algorithm/</a> Visited 4 Jan 2022.</p> <p> Datta, S. “Backtracking Algorithms.” Baeldung CS. 26 Nov 2020. <a href="https://www.baeldung.com/cs/backtracking-algorithms">https://www.baeldung.com/cs/backtracking-algorithms</a> Visited 4 Jan 2022.</p> <p> “Decision problem.” Wikipedia. <a href="https://en.wikipedia.org/wiki/Decision_problem">https://en.wikipedia.org/wiki/Decision_problem</a> Visited 4 Jan 2022.</p>I have been working through Leetcode’s Algorithm I study plan and recently came across the backtracking set of problems (days 10 and 11) . I wanted to give a brief explanation of what backtracking is, and how it uses other CS tools like tree graphs and depth-first search.Reflections and New Goals for 20222022-01-04T00:00:00+00:002022-01-04T00:00:00+00:00http://sassafras13.github.io/Reflections<p>As we begin a new year, I wanted to take a moment to reflect on the things I learned by doing research in 2021, and how I would like to improve in 2022. I have a list of some things that I have learned in 2021, and another list of things that I want to do better in 2022. I’m going to focus on writing about doing good research and getting the most out of the PhD program.</p> <h2 id="things-i-learned-in-2021">Things I Learned in 2021</h2> <p><strong>Write every day.</strong> A friend of mine shared this excellent pair of <a href="https://twitter.com/jbhuang0604/status/1419880122006519809?s=21">Twitter</a> <a href="https://twitter.com/emtiyazkhan/status/1420187521900711937?s=21">threads</a> with me this past summer. They are about how to get unstuck in your research and are absolutely full of useful tips and perspectives. One such tip was to write every day and maintain one document that is a living manuscript for your project (and the eventual paper that will emerge from it). I started (and stopped, and started again…) writing more regularly during the fall and I noticed that I definitely felt like I knew what was going on and what progress I had made when it was clearly written down. There is something about writing that feels more complete and detailed than just copying plots onto Google Slides to track my progress as I go. I don’t think using Slides is a bad thing, but I have found that writing everything demands that I either clearly state my progress or identify gaps in my work that must be filled.</p> <p><strong>Yes, you should be here.</strong> Recently I have been working on a project that excites me, but also is in an area where I do not feel like I belong. I try not to let that bother me but I have found that when I get results that are unexpected or questionable, then my fear of not being good enough rears its ugly head and brings up negative self-talk and a lack of faith in my own results. But often when I get the courage to come back and look at my results later, and examine the mathematics behind them, I am able to interpret them and figure out what is real and what is a bug. I think I will be working on fighting this demon for a long time, but I need to remind myself that I know enough to either get the right results or figure out why they’re wrong, and I don’t need to give up before I’ve even tried to solve the problem.</p> <p><strong>Can I make it simpler?</strong> This is a great piece of advice from one of our collaborators who was helping me figure out <a href="https://sassafras13.github.io/MatlabModeling/">how to debug a dynamics model</a> I was writing. She would always make suggestions about how I could simplify the model, or perform simple but effective tests to make sure it was correct. And when I followed her advice, I found that all of these simplifications helped me to find bugs in my code and gain key insights into how the model worked, which together formed an evidence-based understanding of my system.</p> <p><strong>Go back to the math.</strong> Sometimes I got results that really did not make sense to me, and in those times it helped to return to the basic math and physics behind the model I was building. There I could ask what happened if certain terms in the equations of motion were present or missing, or written with respect to this or that reference frame. The math does not lie and this approach helped me justify the results I was seeing.</p> <p><strong>Really listen to all of your advisors’ comments in meetings, and learn to catch yourself when you pass over something they say.</strong> Looking back on my progress this semester, I have realized that there were several times when a comment one of my advisors made in a meeting later turned out to be critical in helping me make progress. I feel really bad for not having paid more attention to those comments when they were made, but I’ve also asked myself why I didn’t give the suggestion more attention when it was first made. I think that I will often seize on ideas that make sense to me, and put suggestions whose value isn’t immediately obvious to me further down on my priority list. But I’ve learned that I am not the best judge of which comments are most valuable, so I think one way to address this shortcoming is to take a little more time to mull over and discuss all of my advisors’ suggestions as they make them, so that I can more intelligently choose which ones to act on first.</p> <p><strong>The question ”how can I improve for next time?” is one of your most powerful tools.</strong> I learned to do this from a good friend and colleague of mine: at the end of every meeting with my advisors, I ask them “how can I improve for next time?” I try to keep the question open-ended: depending on the week, my advisor might have feedback on the way I am approaching the project, or they might have suggestions on how to improve my plots and presentation, or they might bring up concerns they have about the next step in the project. I have also learned to be patient and leave a little bit of silence after my advisor thinks of one thing to say - often they are switching gears into a more critical perspective and will follow up that first comment with several more suggestions for improvement. And since I was the one who requested the feedback, I leave feeling obligated to act on it, which pushes me to make significant improvements week to week.</p> <h2 id="things-to-improve-in-2022">Things to Improve in 2022</h2> <p><strong>Read more!</strong> While I enjoy reading, I find that reading academic papers can be very daunting, and I often put off doing that work until I need to write the literature review section of a paper. This isn’t good practice, and I recently read an article that described reading papers as part of being a part of the conversation within the scientific community . In fact, the article reported that a recent study found that the scientists who published the most also read the most articles per month on average . So in 2022, I’m going to try to read papers much more frequently.</p> <p><strong>Books exist in the scientific community, too.</strong> My previous point focused on academic papers, but I tend to forget that there are a lot of great books that carry the foundations of different disciplines as well. I want to make some time to read some books for research as well this year.</p> <p><strong>Talk to other people more.</strong> This is probably obvious to everyone, but it turns out that you can learn a lot by talking to other people! I am an introvert so I do not naturally gravitate towards socializing with others but in 2021 I did get to meet people through internships, classes and even by cold-calling people who I thought did interesting research. Thinking back on this year, some of the best things I learned came from random conversations with people at office hours or during pair programming sessions. I also found that going out for drinks with my peers and chatting about our lives in and out of the lab gave me a chance to reflect and get encouragement to go after things I want. I plan to do more of this in 2022.</p> <p><strong>Maintain a steady diet of new ideas.</strong> I have started to do this in 2021 - I was fairly good about attending the <a href="https://www.cmu.edu/aced/sciML.html">SciML lecture series</a> here at CMU, for example, which gave me a weekly opportunity to see a new ML/AI model and understand where and why it is useful. In fact, some of the things I saw at SciML gave me inspiration for my class project for 10-708: Probabilistic Graphical Models, and helped me network more effectively at NeurIPS. But I have a lot more to do here. I want to really start reading papers from different fields regularly, and attend different seminar talks more regularly. I think that one thing that I will need to do to be successful here is to be okay with just getting exposure to new ideas without fully understanding all of them. I think the first step in exploring new ideas is just to know that they exist.</p> <p><strong>Try using other papers’ codebases and models more often.</strong> This fall was the first time I picked up someone else’s codebase and used it, and it was really fun! I was able to set up a conda environment with the right version of Python, import all the right libraries and follow the README to get the code up and running on some toy datasets. That was a minor breakthrough for me and made me feel more comfortable about engaging with other folks’ work. By bringing my own data to someone else’s model and looking at the training results, I got to ask big questions like: What can be learned from this data? What inductive biases do I wish this model had? I also got practice in properly training a model - I had to think about train/validate/test data splits, how to normalize or standardize my data, and how to evaluate my model. I think all of this was great practice, resulted in some useful insights and came at a cheaper cost than completely building a model from scratch myself.</p> <p><strong>Constantly improve my visual design skills.</strong> Few things seem as effective in science communication as really good visual aids. I have come to realize there is a huge advantage in being able to generate information-rich graphics and videos to communicate your science, and I want to level up this skill set in 2022. I want to focus on building excellent GIFs that compare models to experiments, that show dynamic systems in action, that highlight key takeaways without me having to do any talking whatsoever. I think this is going to be a skill that will pay huge dividends in my career.</p> <p><strong>Continue to work on my coding skills.</strong> This year I started solving problems on Leetcode. As with a lot of other things I’ve tried to do this year (difficult machine learning courses, daunting research projects, etc.), it has been a slog. But I’ve started to get a little faster at solving them and they keep me coding regularly. I practice good documentation skills while I work on them, too. I want to continue to do Leetcode problems in 2022, and also to look for other opportunities to improve my coding skills (<a href="https://goodresearch.dev/index.html">this</a> looks like one good resource!).</p> <p><strong>Look at challenges as opportunities, not quagmires.</strong> I think my greatest fear is running into problems I cannot solve. This holds me back a lot because I don’t charge into trying new things because they have a high probability of containing difficult things. But that’s the whole point of doing research! So in 2022 I want to work on leaving that fear behind and remind myself to do the research I would do if I knew I could not fail.</p> <p><strong>Consider many different approaches to solving a problem before actually solving it.</strong> Again, the work I was doing this past fall was very challenging. One of the things that made it so challenging was the fact that I did not spend time up front considering which problem-solving approach would be the best - I just dove straight into trying to solve the problem. I should have stepped back and considered what information I had, what strategies I knew (or could learn), and how I could check that my approach was working. I usually hesitate to do this because I’m an impatient person and I want to always be <strong>doing something</strong>, but I think taking some time up front to really think through my options might have saved me months of struggle.</p> <p><strong>Just keep going.</strong> This also came from one of our collaborators, who said that at the end of the day, being persistent and energetic is key. The rest will come as long as you keep trying. I will hold onto this advice in 2022 and just keep going!</p> <h2 id="references">References</h2> <p> Nassi-Calo, L. “Researchers reading habits for scientific literature.” 3 April 2014. SciELO in Perspective. <a href="https://blog.scielo.org/en/2014/04/03/researchers-reading-habits-for-scientific-literature/#.YdPLQXXMI_C">https://blog.scielo.org/en/2014/04/03/researchers-reading-habits-for-scientific-literature/#.YdPLQXXMI_C</a> Visited 3 Jan 2022.</p>As we begin a new year, I wanted to take a moment to reflect on the things I learned by doing research in 2021, and how I would like to improve in 2022. I have a list of some things that I have learned in 2021, and another list of things that I want to do better in 2022. I’m going to focus on writing about doing good research and getting the most out of the PhD program.ODE Solvers in MATLAB2021-12-17T00:00:00+00:002021-12-17T00:00:00+00:00http://sassafras13.github.io/ODEsolvers<p>In this post I am going to write about solving ordinary differential equations (ode) in MATLAB. I wanted to explore this area because I use MATLAB’s ODE solvers all the time, and I wanted to capture the details of how they work, when different solvers are appropriate and what parameters are available for tuning. I’m going to stay at a somewhat high level when it comes to the details of different ODE algorithms, so if you want more details on, for example, the steps in the Runge-Kutta method, please see . Let’s begin!</p> <h2 id="how-ode-solvers-work">How ODE Solvers Work</h2> <p>MATLAB offers a <a href="https://www.mathworks.com/help/matlab/math/choose-an-ode-solver.html">suite</a> of different ODE solvers available for use. They vary in implementation and advantages, but in general they are all solving equations of the form :</p> $\frac{dy(t)}{dt} = f(t, y(t))$ <p>With initial conditions $$y(t_0) = y_0$$. These solvers output a time series of values for $$y(t)$$ by taking small steps, $$h$$, in the direction of the gradient, $$\frac{dy(t)}{dt}$$, starting at the initial conditions, $$y_0$$. We can write this as :</p> $y(t + h) = y(t) + \int_t^{t+h} f(s, y(s)) ds$ <p>The simplest algorithm for solving this problem numerically is called <strong>Euler’s method</strong>. It uses a fixed step size :</p> <p>$$y_{n+1} = y_n + h f(t_n, y_n)$$ <br /> $$t_{n+1} = t_n + h$$</p> <p>This method is not particularly efficient, and requires very small steps for accuracy. The step size is also fixed, and the choice of step size, at the moment, is rather arbitrary - we don’t have a good sense for what an appropriate size of $$h$$ would be. An important improvement to Euler’s method is including a way to estimate the error in this algorithm - this helps us understand how well our numerical integration is doing and how to change our step size to improve our performance .</p> <p>This leads us to the development of <strong>single-step methods</strong> (or <strong>Runge-Kutta methods</strong>), which compute several values for $$f(t, y)$$ in the interval $$$t_n, t_{n+1}$$$. We use a linear combination of these intermediate values to compute the value $$y_{n+1}$$ and take a step in that direction. Classical Runge-Kutta methods do not actually include a way to compute the error, either, but MATLAB implements all of its ODE solvers (many of which use Runge-Kutta methods) with an error computation included in the function. This is important to know because the error is used to determine whether to accept the integration step, or reduce the step size and try again. The user can define the threshold above which the error is too high and we reduce the step size .</p> <p>If you’re familiar with MATLAB’s ODE solvers, you may already know that there are 2 error parameters that we can set: <strong>relative tolerance</strong> and <strong>absolute tolerance</strong>. The definition of each one is given below :</p> <p><code class="language-plaintext highlighter-rouge">RelTol = abs(X - Y) / min(abs(X), abs(Y))</code><br /> <code class="language-plaintext highlighter-rouge">AbsTol = abs(X - Y)</code></p> <p>The key reason why we have 2 parameters is because if either the value of X or Y becomes very small, then the relative tolerance parameter will go to infinity, and so the ODE solver switches to using the absolute tolerance as the cutoff for the error in that situation . These are 2 parameters that you can manipulate when you are trying to optimize your ODE solver for your particular application.</p> <p>In fact, MATLAB’s naming convention reflects how it computes the error for each ODE solver. The general format is <code class="language-plaintext highlighter-rouge">odennxx</code> where the <code class="language-plaintext highlighter-rouge">nn</code> digits indicate the orders of the methods used to perform the integration, and the <code class="language-plaintext highlighter-rouge">xx</code> suffix, when used, indicates other special properties of that function. So if we consider our favorite <code class="language-plaintext highlighter-rouge">ode45</code>, the naming convention indicates that the function computes the error by comparing the results of a 4th order method with a 5th order method .</p> <h2 id="stiffness">Stiffness</h2> <p>So far we have seen that all of MATLAB’s ODE solvers are taking steps along a gradient towards the solution of an ordinary differential equation over time. We know that the relative and absolute tolerances are important parameters we can control to limit the maximum error we allow during the integration process. Another important aspect to solving ODEs that we need to think about is how <strong>stiff</strong> our system is. This is a slightly difficult concept to grasp, but let’s try. Moler describes a system as being stiff if “the solution being sought varies slowly, but there are nearby solutions that vary rapidly, so the numerical method must take small steps to obtain satisfactory results” .</p> <p>Moler also uses a nice metaphor to explain stiffness from an intuitive standpoint. Trying to solve a stiff system is like following a trail in the dark along the bottom of a narrow valley with very high walls. The naive approach is to wander in a random direction and let the gradient of the valley push you towards the trail. Unfortunately, this is not very efficient as you will likely weave back and forth across the trail and up and down the sides of the valley. This is a stiff problem, and a better way to solve it is to use a flashlight to look ahead of you and see where the trail is and follow it without oscillating up and down the valley walls. The MATLAB equivalent of the flashlight is a subset of ODE solvers that are designed to predict how the solution will continue to evolve and use that information to take steps in the right direction without oscillating via the gradient .</p> <p>A good example of the benefit of using a stiff ODE solver is to try to solve a simple model for flame propagation : <br /> $$\dot{y} = y^2 - y^3$$ <br /> $$y(0) = \delta$$ <br /> $$0 \leq t \leq 2/\delta$$</p> <p>For large values of $$\delta$$ (the radius of the ball of flame), this is not a very stiff problem, but as the radius gets smaller, the system does become more stiff. See this for yourself by running the code below. Try varying the value of <code class="language-plaintext highlighter-rouge">delta</code> to see how it makes the problem more stiff, and how <code class="language-plaintext highlighter-rouge">ode45</code> really slows down when it tries to solve the stiff version of the problem. If you then switch over to <code class="language-plaintext highlighter-rouge">ode23s</code> you will see that this solver is much faster and finding the solution .</p> <pre><code class="language-Matlab">delta = 0.01 ; %delta = 0.00001; F = @(t,y) y^2 - y^3; opts = odeset("RelTol",1.e-4); ode45(F,[0 2/delta],delta,opts) ; %ode23s(F,[0 2/delta],delta,opts); </code></pre> <p>As you can see from this example, knowing whether or not your system is stiff is important because it helps you to select the correct solver for your application. (We call ODE solvers that are designed for stiff systems <strong>implicit solvers</strong> .)</p> <h2 id="numerical-precision">Numerical Precision</h2> <p>So now we know that we can vary the relative and absolute error tolerances, and the type of solver we use. We’ve also talked about how we determine the size of the integration step based on the error computed by the ODE solver. Let’s revisit the step size in a little more detail.</p> <p>First of all, you can set a maximum on the initial step size that your solver will use with the <code class="language-plaintext highlighter-rouge">InitialStep</code> option in the <a href="https://www.mathworks.com/help/matlab/ref/odeset.html"><code class="language-plaintext highlighter-rouge">odeset</code> command</a>. You will likely find that the smaller the initial step size, the slower your code will run. But you may also want to think about numerical precision as you choose your step size and solve ODEs. Specifically, it is important to note that MATLAB uses floats that have 16 orders of precision, and if you start to venture beyond that point, you will likely see numerical instability in your code. For example, if the difference between two numbers is on the order of 1E-16, MATLAB may not return an accurate result . This is just a reminder that if your code is generating numbers this small, you may want to consider <a href="https://en.wikipedia.org/wiki/Nondimensionalization">non-dimensionalizing</a> your code to bring all of the values in your solution closer to 1.</p> <h2 id="closing-thoughts">Closing Thoughts</h2> <p>In this post, we’ve discussed some of the pros and cons of choosing different ODE solvers in MATLAB, and some of the parameters available to us for tuning our code. In closing, I just wanted to suggest that it may be easiest to simply write code to try a range of different solvers, error tolerances and step sizes and see which combination best suits your particular application. You can evaluate the results based on how smooth they are, how close the solutions are to the correct answer (if you know what it is), and how quickly it takes to get there. I hope this post helps, and happy coding!</p> <h2 id="references">References</h2> <p> Moler, C. “Numerical Computing with MATLAB.” MathWorks, 2004. <a href="https://www.mathworks.com/moler/chapters.html">https://www.mathworks.com/moler/chapters.html</a> Visited 17 Dec 2021.</p> <p> Jan. “Absolute and relative tolerance definitions.” MATLAB Answers. 22 Jan 2012. <a href="https://www.mathworks.com/matlabcentral/answers/26743-absolute-and-relative-tolerance-definitions">https://www.mathworks.com/matlabcentral/answers/26743-absolute-and-relative-tolerance-definitions</a> Visited 17 Dec 2021.</p> <p> D’Errico, J. “Unstable derivative approximation when steps get too small.” MATLAB Answers. 2 Jan 2020. <a href="https://www.mathworks.com/matlabcentral/answers/498726-unstable-derivative-approximation-when-steps-get-too-small">https://www.mathworks.com/matlabcentral/answers/498726-unstable-derivative-approximation-when-steps-get-too-small</a> Visited 17 Dec 2021.</p>In this post I am going to write about solving ordinary differential equations (ode) in MATLAB. I wanted to explore this area because I use MATLAB’s ODE solvers all the time, and I wanted to capture the details of how they work, when different solvers are appropriate and what parameters are available for tuning. I’m going to stay at a somewhat high level when it comes to the details of different ODE algorithms, so if you want more details on, for example, the steps in the Runge-Kutta method, please see . Let’s begin!Tips for Writing Dynamics Models in MATLAB2021-12-15T00:00:00+00:002021-12-15T00:00:00+00:00http://sassafras13.github.io/MatlabModeling<p>This blog post has been a long time in the making. I have been spending many hours this semester writing mathematical models in MATLAB for a research project. During this time, I have encountered a lot of challenges that I had never really come up against in my dynamics and controls courses. I started this project feeling poorly equipped to deal with those challenges, but I have tried to develop a couple tricks and tips that make it easier to write and test different models. This blog post will cover some of those tips. I really hope this post helps some other people because I was very frustrated by this process and it would be great if I could save someone else some aggravation.</p> <p>First, let me set the scene so we have some context for what my project entailed, and then I will start listing some specific tips. I wanted to write a model that captured the dynamics of my system from first principles, similar to how we might model canonical systems like a <a href="https://ctms.engin.umich.edu/CTMS/index.php?aux=Activities_Pendulum">pendulum</a>. I wanted to simulate how the model would respond to various control inputs using MATLAB’s ode45 tools. I used analytical tools like looking at stability via the limit cycle and net displacement via <a href="https://sassafras13.github.io/CVF/">connection curvature functions</a> to further understand my system. I also wanted to test the model with different parameters and to try writing the model using different approaches to check I was correct.</p> <p>I am going to break up the tips for working through a process like this into a couple of different steps. First, we’ll talk about how to write the basic dynamics model. Next, we’ll discuss how to test the model we wrote and make sure it’s doing what we expect it to do. Finally, I will end with some tips on how to analyze your model and get the answers you need.</p> <h2 id="writing-a-dynamics-model-in-matlab">Writing a Dynamics Model in MATLAB</h2> <p>In this section I’ll share some tips for how to derive your mathematical model in MATLAB. A lot of this discussion is going to center around LiveEditor scripts, which I absolutely love to use. I would argue that it should become an academic standard that any time you publish a new model in your paper that you include a LiveEditor script with your derivation and your code in your paper supplementary materials!! Begin the campaign!!</p> <p><strong>1. Use LiveEditor scripts.</strong> MATLAB has an equivalent document format to Jupyter notebooks in Python, called LiveEditor scripts. This document format allows you to mix text, LaTeX equations, images and code all in one place, and it is my primary tool for communicating my model derivation as I work on it. I like using it because I can write out the mathematical equation and then on the very next line demonstrate how that equation would be written in code. I also like it because I can save the LiveEditor scripts as PDFs (and possibly as Markdown files too?) and share them with others even if they don’t have a MATLAB license.</p> <p><strong>2. Use the symbolic toolbox.</strong> MATLAB includes a symbolic math toolbox which you can use to write expressions without assigning real values to every single variable. It even allows you to take derivatives, integrate, compute cross products, etc. I write my entire model with the symbolic toolbox and then save the final expressions as MATLAB functions that I can call in regular scripts.</p> <p><strong>3. But don’t get too comfortable with symbolic math.</strong> I am very guilty of letting MATLAB do all my math for me. I mean, why should I have to do all the integration and derivatives if the symbolic toolbox can do it for me? Who do you think I am? Someone who remembers calculus? But the thing is, you <strong>need</strong> to check your work and make sure you trust it. I’m going to touch on this much more later, but when you are modeling a new system for which you do not have a lot of intuition or understanding yet, you should check your math by hand. This also helps you get a better idea of how your model works.</p> <p><strong>4. Make sure to set your symbolic variables to <code class="language-plaintext highlighter-rouge">real</code>.</strong> I ran into this problem with a friend of mine when we were both writing code for the same model but only one of us had added the <code class="language-plaintext highlighter-rouge">real</code> flag to the end of our declaration of symbolic variables. If you do not, you may find that your function returns complex numbers in cases when you only wanted the real part. An example of the proper code for this situation would be:</p> <pre><code class="language-Matlab">syms x y z ‘real’ % real command forces these variables to be real-valued </code></pre> <p><strong>5. Break your LiveEditor scripts into pieces and add numeric prefixes to the filenames to keep them in order.</strong> This might be a problem unique to me because I have an older laptop with a tired CPU, but I do find that my LiveEditor scripts really slow down if there are a lot of lines of text or a lot of computations in a single document. To combat this, I break the document up into multiple LiveEditor scripts and save them with a numeric prefix like <code class="language-plaintext highlighter-rouge">01_setup.mlx</code>, <code class="language-plaintext highlighter-rouge">02_forces.mlx</code>, etc. I think MATLAB might complain about the numeric start to the filenames, but you can override it and it seems to work fine anyway.</p> <p><strong>6. Declare variables once and run LiveEditor scripts in order.</strong> If you do follow tip #5 above, you will need to run the scripts in order so that all the variables required in a later script have already been created by the earlier scripts. I have found two ways to make sure this happens: the first is just to remember to always run the scripts in order. The second is to add commands to run the earlier scripts at the top of later scripts like this:</p> <pre><code class="language-Matlab">01_setup ; % runs file 01_setup.mlx </code></pre> <p><strong>7. Leverage all the useful symbolic toolbox features.</strong> Again, you should always check what the symbolic toolbox does, but you can get it to do some pretty cool things like <code class="language-plaintext highlighter-rouge">collect</code> terms, find the coefficients for a specific variable using <code class="language-plaintext highlighter-rouge">coeffs</code>, integrate and differentiate. Also be sure to use <code class="language-plaintext highlighter-rouge">simplify</code> to clean things up if they are really messy.</p> <p><strong>8. You can take time derivatives with the symbolic toolbox.</strong> At this point some academics are probably pretty angry that I am using the symbolic toolbox for everything, but if you have really hairy functions, you are more likely to make mistakes doing the derivatives by hand than if you used the symbolic toolbox, honestly. Please remember to use your best judgment, but here’s a toy example for how I was able to compute time derivatives of functions using symbolic math:</p> <pre><code class="language-Matlab">% create symbolic variables for x (position) and xdot (velocity) % these variables are not dependent on time syms x xdot 'real' % define function y(x, t) for which we will compute time derivative y = x^2 ; % define the symbolic variables x_t and dx_t % these variables are dependent on time syms t x_t(t) 'real' dx_t = diff(x_t, t) ; % replace the value x with x_t to create a time-dependent version of % the function y, called y_t y_t = subs(y, x, x_t) ; % take the time derivative of y_t to get dy_t, also a function of time dy_t = diff(y_t, t) % replace x_t and dx_t with the symbolic variables that are % not functions of time, x and xdot dy = subs(dy_t, [x_t, dx_t], [x, xdot]) </code></pre> <p>Now that we have talked about writing the derivation for your model, let’s assume that you have now got your function(s) for the dynamics of your system and you are ready to start simulating and analyzing your model.</p> <h2 id="writing-modular-code">Writing Modular Code</h2> <p>A lot of the tips in this section assume that you may have multiple versions of a dynamics model that you want to compare against each other. Following our pendulum example, this might be true if you want to see what happens if you add a torsion spring to your pendulum, or if you distribute the mass along the length of the pendulum instead of writing it as a point mass at the end. In such cases, these tips will really help you keep track of everything. And even if you have only one model that you have written, these tips will hopefully be relevant.</p> <p>The big thing I want to emphasize in this section is that writing modular code is going to be essential. As I hope these tips will prove to you, writing each piece of code only once and reusing it via function calls will be critical to making sure you are properly testing your model. Similarly, modular code will allow you to define all your model parameters once and load them infinitely many times with different versions of your model, so you can test how your models perform without worrying that the parameter values have differed from test to test. Let’s see how this works in more detail.</p> <p><strong>9. Write a script that contains all your model parameters and saves them in a .mat file.</strong> Your dynamics model probably requires some parameters to be defined - for our example of a pendulum dynamics model, we would need to define the length and mass of the pendulum. If I am going to compare different models that all need these same parameters, it makes sense that we write them into a separate script that we can call repeatedly. Typically I write a regular script to define all my parameters, and then use the <code class="language-plaintext highlighter-rouge">save(“myfile.mat”)</code> command to save the variables as a .mat file. I can then load this file in a separate script that is going to simulate the dynamics with <code class="language-plaintext highlighter-rouge">load(“myfile.mat”)</code>. Those variables are immediately loaded into your workspace and you’re ready to go.</p> <p><strong>10. Use folder structures to your advantage.</strong> One characteristic of good code bases is that all the scripts are in subdirectories that make sense, and the overall repo structure is logical and ordered. For example, in my <code class="language-plaintext highlighter-rouge">\src\</code> directory I like to have subdirectories for <code class="language-plaintext highlighter-rouge">\utils</code> (functions that all my scripts use), <code class="language-plaintext highlighter-rouge">\parameters</code> (parameters of my model that I want to reuse, per tip #9), and <code class="language-plaintext highlighter-rouge">\model</code> (for different models I am trying out). You can do anything you want, but having a structure like this can really help keep things organized. Use <code class="language-plaintext highlighter-rouge">fileparts</code> and <code class="language-plaintext highlighter-rouge">addpath</code> functions in MATLAB to add any subdirectories you need to your current workspace so that the IDE knows to look there for those functions when running your code.</p> <p><strong>11. You can look at intermediate values computed inside your ode45 call to help with debugging.</strong> Let’s say that you are computing the trajectory of your pendulum, but you would like to know the torque acting on the pendulum during the simulation, perhaps to help with debugging or to better understand your system. You can actually write your dynamics function to return this torque as an additional output and rerun the dynamics function after the ode45 call to get these intermediate values. Consider this toy example:</p> <pre><code class="language-Matlab">[t,q] = ode45(@(t, q) dynamics(t, q),Tspan,q0) ; % original call to ode45 % call your dynamics function again with the output from the line above [~,torque] = cellfun(@(t,x) dynamics(t,q.'), num2cell(t), num2cell(q,2),'uni',0); torque = cell2mat(torque) ; </code></pre> <h2 id="testing-your-model">Testing Your Model</h2> <p>The next series of tips is specifically about testing your model. Before I share them, let me get on my soapbox for a minute and talk about why testing your model and requisite functions is so important. In research, we are often developing <strong>new</strong> models for things that we have never studied or built before. This means we are venturing into uncharted territory, and we may not have the intuition to know at a glance if our model is working as it should, or if it has a bug. For example, I know how a simple pendulum should behave but if we are trying to model a 3 link pendulum with torsion springs at all the joints, I cannot really imagine in my head what that system is going to do. Our model is the flashlight we are pointing into the darkness of this unknown design space, and we have to make sure it is working properly. You cannot just write a model, say “yup, thanks to my excellent physics skills I’m confident this works” and then start simulating and analyzing right away. You will want to prove to yourself and others that this model really does do exactly what you expect. This series of tips is going to share a couple of approaches for building a body of evidence that convinces everyone that your model is 100% correct.</p> <p><strong>12. Test functions that perform specific mathematical operations against examples from a textbook.</strong> Part of my modeling activities included implementing a lot of mathematical operations for robot kinematics, and I was using the textbook by <a href="https://sassafras13.github.io/MLSBasics/">Murray, Li and Sastry</a> as a reference. Every time I implemented a function like computing a matrix exponential, or hatting a twist, or computing a Jacobian, I ran that function on an example that I pulled from their textbook. I did this in a LiveEditor script so I could include a screenshot of the solution directly from the textbook to show that my function output the correct answer. This doesn’t help prove your entire model is correct, but it will make you more confident that your component functions are working properly.</p> <p><strong>13. Design simple tests to check if your model obeys simple laws of physics.</strong> I really have to thank one of my advisors for this tip. If you have a complicated system (like that 3 link pendulum with torsion springs everywhere), think about the simplest possible tests you could run on your model to make sure it did simple things correctly. For example, if you displace one link from its at-rest position, does it fall back to its at-rest position and stop moving? If all the torsion springs had spring constants of zero, do the links just hang along the vertical axis? This may seem trivial, but if your model <strong>can’t</strong> do this, then you know you have problems that you need to address. The other nice thing about using this strategy is that it’s easy to know what the model <strong>should</strong> do in such situations - this might not be true for more complicated situations.</p> <p><strong>14. Use experimental data as ground truth.</strong> Of course, you may reach a point in your testing process where you are confident the model performs correctly in simple tests, but you want to see if it does the right thing in more complex situations. If possible, consider building a quick and cheap prototype that you can use to study the system’s response to different inputs. Capture video of your experimental system and use OpenCV to track the motion of all the links to build up a body of ground truth data that you can use to validate your model. (Using OpenCV in Python might sound challenging but there is lots of documentation available online and I found that I was able to go from never having used it to getting useful data in less than a day’s work thanks to the Internet.)</p> <p><strong>15. Build up from a model you already have or trust to the model you want.</strong> If you’re working on a project where someone else has already written a model similar to yours, consider using their model as a starting point and modify it until it has all the features you need. Again, we are exploring the unknown here and someone else’s model is like a stepping stone that helps you start to cross the stream. However, I should also caution that this process may make things more complicated if you need to modify the model in ways that generate very different results. It may become difficult to compare the models and you may create more confusion than if you had started from scratch.</p> <h2 id="getting-the-answers-you-need">Getting the Answers You Need</h2> <p>Okay, so at this point let’s assume that you’ve built a dynamics model for your system and you have rigorously tested it. Now you want to analyze it with cool techniques like plotting the limit cycle or generating constraint curvature functions. Let me share a few final tips on how to conduct the analytical process for great success.</p> <p><strong>16. Write a new script for each analytical process.</strong> This might seem like a no-brainer, but have separate, well-named scripts for each analysis you are going to conduct. Since you already have modular code with parameters that you can load from a .mat file, you can perform different analyses on the same model with the same parameters easily. I also like to write code to save figures from each analytical process in these scripts. Ideally, each script should output a figure that you use in your paper, and other researchers looking at your code base should be able to recreate all your figures from these scripts.</p> <p><strong>17. If you want to run analyses that generate lots of data or figures, consider automating directory creation and file saving to organize the output of this analysis.</strong> To give you a concrete example here, let’s say that we want to see how our pendulum model will perform for different combinations of lengths and masses. In your analysis script, you can create subdirectories and save plots and data files to these subdirectories automatically. You can run the script while you’re out for lunch and then come back and look at all of the resultant plots that have already been organized into subfolders for you.</p> <p><strong>18. Save figures as .fig files so you can continue to edit them later.</strong> I often find that it takes me many iterations before I have a figure that I really like, but if I save my figures as .png files then I cannot edit those files later, and instead I have to re-run all the code used to make that figure before I can edit it. That’s lame and a waste of your precious time, so instead save your figures as .fig files, which is the native MATLAB figure format (or save your figures twice, once as .fig and once as .png if you want!). Now you can always reload these figures to your workspace and edit them with MATLAB’s figure editor.</p> <p><strong>19. Building animations of your dynamic systems is almost always a worthwhile investment of your time.</strong> This might be one of my most important tips: if you want people to really understand what you are doing, building information-rich animations is the best way to get them to quickly grok what you are doing. MATLAB allows you to generate animations (I prefer .gifs but I think you could output .mp4s and other formats if you wanted). MATLAB of course has excellent documentation on different ways to build animations, but to be honest with you, I once copied <a href="http://web.mit.edu/8.13/matlab/MatlabTraining_IAP_2012/AGV/DemoFiles/ScriptFiles/html/Part3_Animation.html">this script from MIT</a> and I have just continued to modify it for my purposes ever since. (Thanks MIT!)</p> <h2 id="concluding-thoughts">Concluding Thoughts</h2> <p>My vision for this post was to share lots of strategies for developing dynamics models that are not often taught in class. They may not all be helpful, but I had a lot of trouble finding useful advice on the Internet so I hope that this blog post helps fill that gap. If you know of more resources on this topic, please share them because I am always looking to learn more best practices for writing robust models.</p>This blog post has been a long time in the making. I have been spending many hours this semester writing mathematical models in MATLAB for a research project. During this time, I have encountered a lot of challenges that I had never really come up against in my dynamics and controls courses. I started this project feeling poorly equipped to deal with those challenges, but I have tried to develop a couple tricks and tips that make it easier to write and test different models. This blog post will cover some of those tips. I really hope this post helps some other people because I was very frustrated by this process and it would be great if I could save someone else some aggravation.Motion Planning with Calculus2021-11-30T00:00:00+00:002021-11-30T00:00:00+00:00http://sassafras13.github.io/CVF<p>In this post we are going to talk about <strong>connection vector fields</strong> and <strong>connection curvature functions</strong>. These are two mathematical tools used to study robot locomotion. I have been using them in my research lately, and I wanted to provide a rigorous explanation of what they are and how to use them. For the purposes of this discussion, we are going to assume that we have a 3 link swimming robot where we know the angle of each link and the orientation and position of the entire robot system. This robot is shown in Figure 1. In this case, the connection vector fields and connection curvature functions can both help us to study how the joint angles impact the system orientation and position. Let’s dive in to see how this works!</p> <p><img src="/images/2021-11-30-CVF-fig1.png" alt="Fig 1" title="Figure 1" /> <br /> Figure 1 - inspired by </p> <h2 id="the-general-reconstruction-equation">The General Reconstruction Equation</h2> <p>We begin by introducing the <strong>general reconstruction equation</strong>, which is used to map the joint angle velocities, $$\dot{r}$$ to the body velocity, $$\xi$$, of the robot system :</p> $\xi = -\mathbf{A}(r) \dot{r} + \mathbf{\Gamma}(r) p$ <p>Where $$\mathbf{A}(r)$$ is the <strong>local connection</strong> that maps the joint velocities to body velocity, and $$\mathbf{\Gamma}(r)$$ is the momentum distribution function, and $$p$$ is the generalized momentum. For the purposes of this discussion, let’s assume that our robot has zero momentum (for example, if it was in a <a href="https://sassafras13.github.io/ScallopTheorem/">low Reynolds number environment</a>). That cleans up our equation above to a specific form called the <strong>kinematic reconstruction equation</strong> :</p> $\xi = -\mathbf{A}(r) \dot{r}$ <p>The matrix $$\mathbf{A}(r)$$ is going to be the key to studying our robot system in the next sections.</p> <h2 id="connection-vector-fields">Connection Vector Fields</h2> <p>The problem with expressions for the local connection, $$\mathbf{A}(r)$$, is that they can be complicated and difficult to parse by inspection. However, if we plot the local connection, we can learn useful things about our robot system’s locomotion. One way to think about $$\mathbf{A}(r)$$ is to understand that each row of the matrix, when dotted with the joint velocities, $$\dot{r}$$, produces the velocity of the robot system in the corresponding body velocity component. Mathematically speaking :</p> $\xi_{i} = \mathbf{A}^{i}(r) \cdot \dot{r}$ $\left[ \matrix{\xi_{x} \cr \xi_y \cr \xi_{\theta}} \right] = \left[ \matrix{A^{x,1}(r) &amp; A^{x,2}(r) \cr A^{y,1}(r) &amp; A^{y,2}(r) \cr A^{\theta,1}(r) &amp; A^{\theta,2}(r)} \right] \cdot \left[ \matrix{\dot{\alpha}_1 \cr \dot{\alpha}_2} \right]$ <p>Once we compute this dot product, we can plot how each body position coordinate (x, y and $$\theta$$) changes as the joint angles $$\alpha_1$$ and $$\alpha_2$$ change. This is called the connection vector field, and an example of the connection vector fields for a 3 link microswimmer is shown in Figure 2. If there is a change in the joint angles that follows the arrows (i.e. gradient, or direction of change) of the connection vector field, that means that the corresponding body position is also changing. If, instead, the joint angles of the robot are changing in a way that is orthogonal to the vector field, then that would result in a net zero motion for that corresponding body position .</p> <p><img src="/images/2021-11-30-CVF-fig2.png" alt="Fig 2" title="Figure 2" /> <br /> Figure 2 - inspired by </p> <p>The connection vector fields shown in Figure 2 have some physically relevant information. First of all, if we look at the plot for $$\theta$$, we can see that the gradient is generally moving towards $$\mathbf{\alpha} = (+\alpha_1, -\alpha_2)$$, suggesting that the entire orientation of the microswimmer prefers to rotate against the direction of rotation of the outer links as compared to the center. Secondly, for the $$\mathbf{A}^x$$ plot we can see that the gradient of the field is almost 0 at the points $$\mathbf{\alpha} = (0, 0)$$, when the entire robot system is completely horizontal. This should make sense because we can imagine that if the robot is completely horizontal, it is only experiencing forces along the x-axis, which will encourage the robot to remain in this configuration. Conversely, for the $$\mathbf{A}^y$$ plot the gradient of the field is 0 at the points $$\mathbf{\alpha} = (\pm \frac{\pi}{2}, \pm \frac{\pi}{2})$$ because here the robot is vertically oriented and forces along the y-axis will cause it to remain in this configuration .</p> <h2 id="constraint-curvature-functions">Constraint Curvature Functions</h2> <p>The other useful tool for understanding the local connection is used to understand the total displacement of the robot over a series of motions, instead of its instantaneous motion. If we imagine the two outer links of the robot moving cyclically, we can imagine that by moving both forwards and backwards, they should be able to achieve a gait that results in net movement of the robot. We can study the curvature of the local connection to understand how much displacement the robot will experience for different gaits. This information is presented as a set of constraint curvature functions (CCFs) or height functions (when working in 2D) .</p> <p>In order to better understand CCFs, we need to take a quick detour into some topics from multivariable calculus, starting with the <strong>curl operator</strong>.</p> <h3 id="curl-operator">Curl Operator</h3> <p>When we have a vector field (like the ones shown in Figure 2), we can compute the curl, $$\nabla \times \mathbf{F}$$, of the vector field, which tells us how much the field is circling at a given point . If we compute the curl of the row of the local connection corresponding to the robot system’s orientation, $$\mathbf{A}^{\theta}$$, then we can see how much net rotation we can achieve for a given combination of joint angles, $$\mathbf{\alpha}$$ . But just how much net rotation will we achieve? In order to compute this, we need to revisit <strong>Green’s Theorem</strong>.</p> <h3 id="greens-theorem">Green’s Theorem</h3> <p>Green’s Theorem is a special, 2-dimensional case of <strong>Stokes’ Theorem</strong>, which states that “the line integral of a vector field over a loop is equal to the flux of its curl through the enclosed surface” . This essentially means that if we integrate over a closed loop drawn in a vector field, we will obtain a value that is equal to the amount of curvature in the field within the loop. It is useful to us because we can draw a loop in our connection vector fields, which corresponds to a cyclic gait defined by $$\mathbf{\alpha} = (\alpha_1(t), \alpha_2(t))$$, and integrate over that loop to determine what the robot’s net displacement will be for that gait.</p> <p>We typically use Green’s Theorem for this calculation simply because our vector fields are 2-dimensional. Green’s Theorem in mathematical terms is :</p> $\oint_C (L dx + M dy) = \iint_D \bigg( \frac{\partial M}{\partial x} - \frac{\partial L}{\partial y} \bigg) dx dy$ <p>An example plot of the constraint curvature function for our robotic system is shown in Figure 3. If the gait cycle (drawn as a loop over this surface) moves in a positive direction (i.e. counterclockwise using the right hand rule) around regions of the surface where the curl is positive, then that means we will achieve a net positive rotation of the entire system. Similarly, if we draw a loop that moves in a negative (clockwise) direction around regions of the surface where the curl is negative, we will also achieve a net positive rotation. Overall, this plot helps us to understand what kinds of gaits will result in net displacement in x, y and $$\theta$$ .</p> <p><img src="/images/2021-11-30-CVF-fig3.png" alt="Fig 3" title="Figure 3" /> <br /> Figure 3 - inspired by </p> <h2 id="conclusion">Conclusion</h2> <p>I hope this post was a useful introduction to one approach for motion planning. I would recommend reading both  and  if you are curious to learn more about this topic -  in particular is a good in-depth explanation with several examples available for review.</p> <h2 id="references">References</h2> <p> Hatton, R. L., &amp; Choset, H. (2013). Geometric swimming at low and high Reynolds numbers. IEEE Transactions on Robotics, 29(3), 615–624. https://doi.org/10.1109/TRO.2013.2251211</p> <p> “Curl (mathematics).” Wikipedia. <a href="https://en.wikipedia.org/wiki/Curl_(mathematics)">https://en.wikipedia.org/wiki/Curl_(mathematics)</a> Visited 30 Nov 2021.</p> <p> “Stokes’ theorem.” Wikipedia. <a href="https://en.wikipedia.org/wiki/Stokes%27_theorem">https://en.wikipedia.org/wiki/Stokes%27_theorem</a> Visited 30 Nov 2021.</p> <p> “Green’s theorem.” Wikipedia. <a href="https://en.wikipedia.org/wiki/Green%27s_theorem">https://en.wikipedia.org/wiki/Green%27s_theorem</a> Visited 30 Nov 2021.</p> <p> Hatton, R. L., &amp; Choset, H. (2011). Geometric motion planning: The local connection, Stokes’ theorem, and the importance of coordinate choice. The International Journal of Robotics Research, 30(8), 988–1014. https://doi.org/10.1177/0278364910394392</p>In this post we are going to talk about connection vector fields and connection curvature functions. These are two mathematical tools used to study robot locomotion. I have been using them in my research lately, and I wanted to provide a rigorous explanation of what they are and how to use them. For the purposes of this discussion, we are going to assume that we have a 3 link swimming robot where we know the angle of each link and the orientation and position of the entire robot system. This robot is shown in Figure 1. In this case, the connection vector fields and connection curvature functions can both help us to study how the joint angles impact the system orientation and position. Let’s dive in to see how this works!Google’s Python Style Guide Part 1 - Functionality2021-07-11T00:00:00+00:002021-07-11T00:00:00+00:00http://sassafras13.github.io/PythonStyleGuideFunc<p>I just recently learned that Google published a style guide for how their developers write clean code in Python . I wanted to use a couple of posts to outline some of the things I learned from that style guide. I will write this post to describe some of the functional recommendations given in the style guide, and a follow-up post will detail some of the specific style requirements Google listed. Let’s get started!</p> <h2 id="googles-python-style-guide---functional-recommendations">Google’s Python Style Guide - Functional Recommendations</h2> <p>The first half of Google’s style guide focuses on best practices for using different functionalities within Python. I should note that there are more recommendations than I am giving here - I have selected the items that were relevant to aspects of Python that I already use or want to use more frequently. I would highly recommend glancing through the style guide yourself if you want a more complete picture of Google’s recommendations. But for now, here is what I thought was important :</p> <p><strong>Use a code linter.</strong> A code linter is a tool that looks at code and identifies possible errors, bugs or sections that are poorly written and could contain syntax errors . Google recommends using a Python library like pylint to check your code before deploying it.</p> <p><strong>Use import statements for packages and modules but not individual classes or functions.</strong> I think this recommendation helps with namespace management - if you are only importing complete packages/modules, then we will always be able to trace back specific classes or functions to those libraries (i.e. we know that module.class is a class that belongs to “module”). This practice also helps prevent collisions (i.e. having multiple functions with the same name).</p> <p><strong>Import modules by full pathname location.</strong> This is important for helping the code to find modules correctly. Google recommends writing this:</p> <p><code class="language-plaintext highlighter-rouge">from doctor.who import jodie</code></p> <p>Instead of writing this:</p> <p><code class="language-plaintext highlighter-rouge">import jodie</code></p> <p><strong>Use exceptions carefully.</strong> Usually exceptions are only for breaking out of the flow for specific errors and special cases. Google recommends using built-in exception classes (like KeyError, ValueError, etc. ) whenever possible. You should try to avoid using the “Except:” statement on its own because it will catch too many situations that you probably don’t want to have to handle. On a similar note, try to avoid having too much code in a try-except block and make sure to always end with “finally” to make sure that essential actions are always completed (like closing files).</p> <p><strong>Do not use global variables.</strong> Global variables can be variables that have scopes including an entire module or class. Python does not have a specific datatype for constants like other languages, but you can still stylistically create them , for example by writing them as _MY_CONSTANT = 13. The underscore at the beginning of the variable name indicates that the variable is internal to the module or class that is using it.</p> <p><strong>It is okay to use comprehensions and generators on simple cases, but avoid using them for more complicated situations.</strong> Comprehensions<em>1 and generators</em>2 are really useful because they do not require for loops, and they are elegant and easy to read. They also do not require much memory. However, complicated constructions of comprehensions/generators can make your code more opaque. Generally, Google recommends using comprehensions/generators as long as they fit on one line or the individual components can be separated into individual lines.</p> <p><strong>Use default iterators and operators for data types that support them.</strong> Some data types, like lists and dictionaries, support specific iterator keywords like “in” and “not in.” It’s acceptable to use these iterators because they are simple, readable and efficient, but you want to make sure that you do not change a container when you are iterating over it (since lists and dictionaries are <a href="https://sassafras13.github.io/MutvsImmut/">mutable objects</a> in Python).</p> <p><strong>Lambda functions are acceptable as one-liners.</strong> Lambda functions define brief functions in an expression, such as :</p> <p><code class="language-plaintext highlighter-rouge">(lambda x: x + 1)(2) = 2 + 1 = 3</code></p> <p>They are convenient but hard to read and debug. They also are not explicitly named, which can be a problem. Google recommends that if your lambda function is longer than 60 to 80 characters, then you should just write a proper function instead.</p> <p><strong>Default argument values can be useful in function definitions.</strong> You can assign default values to specific arguments to a function. You always want to place these parameters last in the list of arguments for a given function. This is a good practice when the normal use case for a function requires default values, but you want to give the user the ability to override those values in special circumstances. One downside to this practice is that the defaults are only evaluated <em>once</em> when the module containing the function is loaded. If the argument’s value is <em>mutable</em>, and it gets modified during runtime, then the default value for the function has been modified <em>for all future uses</em> of that function!*3 So the best practice to avoid this issue is to make sure that you do not use mutable objects as default values for function arguments.</p> <p><strong>Use implicit false whenever possible.</strong> All empty values are considered false in a Boolean context, which can really help with improving your code readability. For example, this is how we would write an implicit false:</p> <p><code class="language-plaintext highlighter-rouge">if foo: …</code></p> <p>This is the explicit version, which is not as clean:</p> <p><code class="language-plaintext highlighter-rouge">if foo != [ ]: …</code></p> <p>Not only is the implicit approach cleaner, it is also less error prone. The only exception is <strong>if you are checking integers</strong>, when you want to be explicit, i.e.:</p> <p><code class="language-plaintext highlighter-rouge">if foo == 0: …</code></p> <p>In this case you want to be clear about whether you want to know if the integer variable’s value is zero, or if it is simply empty (in which case you would use “if foo is None”). Also, remember that empty sequences are false - you don’t need to check if they’re empty using “len(sequence)”.</p> <p><strong>Annotate code with type hints.</strong> This is especially good practice for function definitions. It helps with readability and maintainability of your code. It often looks like this:</p> <pre><code class="language-(python3)">def myFunc(a: int) -&gt; list[int]: … </code></pre> <p>That is all for today’s post on functional recommendations in Google’s Python style guide. Next time, I will write more specifically about the stylistic recommendations that Google provides for coding in Python. Thanks for reading!</p> <h2 id="footnotes">Footnotes</h2> <p>*1 Comprehensions are a tool in Python that let you iterate over certain data types like lists, sets, or generators. They can make your code more elegant, and allow you to generate iterables in one line of code. The syntax for a comprehension looks like this : <code class="language-plaintext highlighter-rouge">new_list = [expression for member in iterable]</code></p> <p>*2 Generator functions are useful for iterating over really large datasets. They are called “lazy iterators” because they do not store their internal state in memory. They also use the “yield” statement instead of the “return” statement. This means that they can send a value back to the code that is calling the generator function, but they don’t have to exit after they have returned, as in a regular function. This allows generator functions to remember their state. In this way generators are very memory efficient but allow for iteration similar to comprehensions .</p> <p>*3 This happened to a classmate of mine once, and he said it almost ruined a paper submission for him. This is covered in detail in .</p> <h2 id="references">References</h2> <p> “Google Python Style Guide.” <a href="https://google.github.io/styleguide/pyguide.html">https://google.github.io/styleguide/pyguide.html</a> Visited 11 Jul 2021.</p> <p> Mallett, E. E. “Code Lint - What is it? What can help?” DCCoder. 20 Aug 2018. <a href="https://dccoder.com/2018/08/20/code-lint/">https://dccoder.com/2018/08/20/code-lint/</a> Visited 28 Jun 2021.</p> <p> “Built-in Exceptions.” The Python Standard Library. <a href="https://docs.python.org/3/library/exceptions.html">https://docs.python.org/3/library/exceptions.html</a> Visited 11 Jul 2021.</p> <p> Hsu, J. “Does Python Have Constants?” Better Programming on Medium. 7 Jan 2020. <a href="https://betterprogramming.pub/does-python-have-constants-3b8249dc8b7b">https://betterprogramming.pub/does-python-have-constants-3b8249dc8b7b</a> Visited 11 Jul 2021.</p> <p> Timmins, J. “When to Use a List Comprehension in Python.” Real Python. <a href="https://realpython.com/list-comprehension-python/">https://realpython.com/list-comprehension-python/</a> Visited 11 Jul 2021.</p> <p> Stratis, K. “How to Use Generators and yield in Python.” Real Python. <a href="https://realpython.com/introduction-to-python-generators/">https://realpython.com/introduction-to-python-generators/</a> Visited 11 Jul 2021.</p> <p> Burgaud, A. “How to Use Python Lambda Functions.” Real Python. <a href="https://realpython.com/python-lambda/">https://realpython.com/python-lambda/</a> Visited 11 Jul 2021.</p> <p> Reitz, K. “Common Gotchas.” The Hitchhiker’s Guide to Python. <a href="https://docs.python-guide.org/writing/gotchas/">https://docs.python-guide.org/writing/gotchas/</a> Visited 11 Jul 2021.</p>I just recently learned that Google published a style guide for how their developers write clean code in Python . I wanted to use a couple of posts to outline some of the things I learned from that style guide. I will write this post to describe some of the functional recommendations given in the style guide, and a follow-up post will detail some of the specific style requirements Google listed. Let’s get started!Week of July 5 Paper Reading2021-07-06T00:00:00+00:002021-07-06T00:00:00+00:00http://sassafras13.github.io/WeekJul5Rdg<p>This week I have been interested in reading papers about how to model time series data using unsupervised methods in machine learning. I will briefly summarize a couple of papers on the topic below.</p> <h2 id="paper-1-velc-a-new-variational-auto-encoder-based-model-for-time-series-anomaly-detection-by-zhang-et-al">Paper 1: VELC: A New Variational Auto Encoder Based Model for Time Series Anomaly Detection by Zhang et al.</h2> <p>This paper presents a method for finding anomalies in time series data using variational autoencoders. I did not know what <strong>anomaly detection</strong> really was until I read this paper - it is essentially the practice of looking for rare events in the data that are very different from the rest of the dataset, but are likely to be important, not random noise. Anomaly detection can be really difficult to do in a supervised fashion because the size of the anomaly class will generally be much smaller than the size of the “normal” class. But this paper proposes an unsupervised learning approach that side-steps that problem .</p> <p>The authors introduce a VAE that has an additional re-Encoder and Latent Constraint network (VELC) that helps the model tell the difference between normal and anomalous data based on how well the model can reconstruct the input data. The basic idea here is that the model is trained to encode and decode normal data, and as training progresses it will minimize its reconstruction error (i.e. how different the reconstructed data is from the original input data). Then when the model is given a mix of normal and anomalous test data, the reconstruction error should increase dramatically for the anomalous samples as compared to the normal samples, indicating which samples are anomalous. So if the reconstruction error is small, the input is normal; if the error is large, the input is anomalous .</p> <p><img src="/images/2021-07-06-WeekJul5Rdg-fig1.png" alt="Fig 1" title="Figure 1" /> <br /> Figure 1 - Source: </p> <p>A more detailed view of the VELC model is shown in Figure 1. The VAE itself uses an LSTM as the encoder and decoder, because the LSTM is designed to process time-series data. There is a constraint network that learns the latent space in the VAE alongside the encoder and decoder during training. The purpose of the constraint network is to limit the samples pulled from the latent space during testing to only look like samples it saw during training - in other words, the constraint network ensures that the VELC model only pulls normal samples from the latent space of the VAE. The second re-encoder maps the output of the first decoder to a new latent space. The authors argue that the second re-encoder helps to ensure that the model trains more accurately, and computes more accurate anomaly scores, than it would with just a classical VAE structure .</p> <h2 id="paper-2-a-deep-neural-network-for-unsupervised-anomaly-detection-and-diagnosis-in-multivariate-time-series-data-by-zhang-et-al">Paper 2: A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data by Zhang et al.</h2> <p>This paper introduces a new model, Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED), which extends the capabilities of VELC so that instead of considering a single time series, we can perform anomaly detection across multiple time series at the same time. (The authors refer to this as <strong>multivariate time series</strong> data.) Zhang et al. argue that their model is the first to simultaneously complete 3 tasks :</p> <ol> <li>Anomaly detection: as above</li> <li>Root cause identification: identifying which time series signal(s) in the input is contributing to to the anomaly</li> <li>Anomaly severity: giving the user a metric that estimates how strongly the anomaly deviates from normal data</li> </ol> <p><img src="/images/2021-07-06-WeekJul5Rdg-fig2.png" alt="Fig 2" title="Figure 2" /> <br /> Figure 2 - Source: </p> <p>The graphical abstract for this picture is given in Figure 2. The basic idea of this model is similar to the VELC model in that it is also trying to reconstruct an input signal, and using the reconstruction error as an indication of whether or not that input is normal or anomalous. The MSCRED model also uses a variant of an LSTM to handle the time series data, similar to VELC. The difference is that the MSCRED model assumes that there may be useful correlations between different time series signals that we should look for in order to identify anomalies in the full dataset. Let’s take a closer view at the different components of the MSCRED model to understand how it looks for correlations across time series signals as well as within them .</p> <p>The authors assume that the raw data is in the form of <em>n</em> time series that extend for <em>T</em> time; they also assume that the data is normal for time in the interval [0, <em>T</em>] but that the data input to the model after that time can be abnormal. Their real-world example is a power plant that has time series data from different sensors that together can be used to look for anomalies that could be indications of potential failures. But the input to the MSCRED is not the raw time series data. Instead, Zhang et al. compute the pairwise correlations between each time series and save that data in <em>n x n</em> <strong>signature matrices</strong>. It is these signature matrices that then become the input to the model .</p> <p>The first component of the model is a convolutional encoder that is designed to look for correlations across the time series signals (i.e. for correlations between the entries in the signature matrix, where each entry is the correlation between two signals). This convolutional encoder learns to represent the spatial information in the signature matrices, and passes this on to an attention-based convolutional LSTM model (ConvLSTM). The authors explain that they adapted the original ConvLSTM model , which was able to learn the temporal information in a video sequence, but struggled to perform over longer time intervals. To mitigate this issue, they add an attention mechanism to the original ConvLSTM which allows it to selectively remember the relevant hidden states across time steps, increasing the memory of the model. Together, the attention mechanism and ConvLSTM are capable of finding both temporal and spatial patterns in the signature matrix, and they return feature maps indicating which elements in time and space are important to pay attention to. The feature maps are processed by a convolutional decoder and used to reconstruct the signature matrices. We use the residual of the signature matrices (i.e. the difference between the input and the output signature matrices) to compute a reconstruction score and identify which inputs are anomalies and which are normal. These scores help identify the anomalies, diagnose which signals contributed to the anomaly (i.e. root cause analysis) and the scores also contain information about the severity of the anomaly .</p> <h2 id="paper-3-learning-to-simulate-complex-physics-with-graph-networks-by-sanchez-gonzalez-et-al">Paper 3: Learning to Simulate Complex Physics with Graph Networks by Sanchez-Gonzalez et al.</h2> <p>DeepMind delivers again with a beautiful paper on how graph networks can form the basis of a <strong>learned simulator</strong> that can model the physics of a variety of systems (i.e. from water to sand). This paper introduces a Graph Network-based Simulator (GNS) that uses a graph-based representation to model the physics of a system of particles. The authors argue that the value of building a learned simulator with artificial intelligence is that it can a) be built much faster, b) can run more efficiently (both in time and in memory allocation) and c) the simulator remains efficient when scaled up to larger systems .</p> <p><img src="/images/2021-07-06-WeekJul5Rdg-fig3.png" alt="Fig 3" title="Figure 3" /> <br /> Figure 3 - Source: </p> <p>As shown in Figure 3, the overall architecture of DeepMind’s solution relies on a learned simulator that regularly updates the model of the system to accurately recreate the dynamics of a group of particles that represent water or sand or other fluid or rigid systems. In Figure 3, the simulator uses some learned dynamics, $d_{\theta}$, to update the states of all the particles and simulate their trajectories. The learned dynamics, $$d_{\theta}$$, use a set of graph networks to learn the dynamics . Let’s dive into the structure of $$d_{\theta}$$ in more detail.</p> <p>The dynamics are modeled using 3 key components: an encoder, a processor and a decoder. The encoder takes as input the first state of all of the particles in the system, and encodes that information into a graph in the latent space. This graph is then passed to the processor, which learns message-passing functions that connect all the nodes in the graph together (within a certain radius) and generates a series of output latent graphs that represent the progression of the system over time. The decoder extracts the relevant dynamics information (i.e. accelerations) from the final latent graph and passes them to an update mechanism, which in this case is a simple Euler integration function. In essence, this entire GNS is still using integration to solve the dynamics by stepping through sequential time steps - the only complexity comes in to how the dynamics are learned and represented and applied. The authors argue that this model is very general and so can be used to represent many different types of particle systems .</p> <p>This all still felt a little general to me until I read deeper into the methods section to understand how each piece (the encoder, the processor and the decoder) were actually implemented. The encoder takes in the position, last <em>C</em> velocities and static properties of each particle and assigns one node to each particle. The encoder then learns functions to embed the input data into the nodes and edges of this graph. These encoder embedding functions are MLPs. The graphs embedded by the encoder are then passed to the processor which has a stack of <em>M</em> graphs with identical structure. The processor learns edge and node update functions, which are also MLPs. Finally, the decoder has a learned function (also an MLP) that is applied to each node on the final graph from the processor and outputs the second derivatives for each node to be passed to the update mechanism .</p> <p>I also wanted to point out that I loved the way that this paper was written. Its figures do an excellent job of giving a high-level view of the architecture that is simple without losing too much resolution. The paper is written so that the reader spirals around the architecture, adding successively more detail on each pass. In total, I believe the authors went through the architecture three times, each time adding more information on the philosophy and implementation of the model design. This is something I would really like to do in my own writing.</p> <h2 id="references">References:</h2> <p> Zhang, C., Li, S., Zhang, H., &amp; Chen, Y. (2019). VELC: A New Variational AutoEncoder Based Model for Time Series Anomaly Detection. https://arxiv.org/abs/1907.01702v2</p> <p> Zhang, C., Song, D., Chen, Y., Feng, X., Lumezanu, C., Cheng, W., Ni, J., Zong, B., Chen, H., &amp; Chawla, N. V. (2018). A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, 1409–1416. https://arxiv.org/abs/1811.08055v1</p> <p> Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; and Woo, W.c. 2015. Convolutional lstm network: A machine learning approach for precipitation nowcasting. In NIPS, 802–810.</p> <p> Sanchez-Gonzalez, A., Godwin, J., Pfaff, T., Ying, R., Leskovec, J., &amp; Battaglia, P. W. (2020). Learning to Simulate Complex Physics with Graph Networks.</p>This week I have been interested in reading papers about how to model time series data using unsupervised methods in machine learning. I will briefly summarize a couple of papers on the topic below.Getting Some Intuition for Matrix Exponentials2021-06-11T00:00:00+00:002021-06-11T00:00:00+00:00http://sassafras13.github.io/MatrixExps<p>In a <a href="https://sassafras13.github.io/MLSBasics/">recent post</a>, we talked about some fundamental mathematical operations presented in a robotics textbook written by Murray, Li and Sastry. One of these operations was a <strong>matrix exponential</strong>, which was unfamiliar to me. It turns out that matrix exponentials are a really cool idea that appear in lots of fields of science and engineering. Thanks to a fabulous video by 3Blue1Brown , I am going to present some of the basic concepts behind matrix exponentials and why they are useful in robotics when we are writing down the kinematics and dynamics of a robot.</p> <p>In MLS, the authors explain that matrix exponentials are useful for “map[ping] a twist into the corresponding screw motion” . Recall that a twist is infinitesimally small and the screw contains the full magnitude of the motion . Another way to say this is that the matrix exponential can encode a rotation as a function of the direction of rotation and the angle of rotation . I will explain this in more detail in this post.</p> <h2 id="basic-definition-of-a-matrix-exponential">Basic Definition of a Matrix Exponential</h2> <p>A matrix exponential is related to the simpler concept of raising the number <em>e</em> to a real number exponent. We can write this operation as an infinite sum :</p> $e^x = x^0 + x^1 + \frac{1}{2}x^2 + \frac{1}{6}x^3 + … + \frac{1}{n!}x^n + …$ <p>Notice that this expression is a <a href="https://sassafras13.github.io/TaylorSeries/">Taylor series</a>. The sum of the Taylor series approaches the value of $$e^x$$ .</p> <p>A matrix exponential is, in a sense, an extension of this idea, using matrices as input $$x$$ instead of real numbers. For example, I can rewrite the expression above with $$x = \left[ \matrix{1 &amp; 2 \cr 3 &amp; 4} \right]$$ :</p> $e^{\left[ \matrix{1 &amp; 2 \cr 3 &amp; 4} \right]} = \left[ \matrix{1 &amp; 2 \cr 3 &amp; 4} \right]^0 + \left[ \matrix{1 &amp; 2 \cr 3 &amp; 4} \right]^1 + \frac{1}{2}\left[ \matrix{1 &amp; 2 \cr 3 &amp; 4} \right]^2 + \frac{1}{6}\left[ \matrix{1 &amp; 2 \cr 3 &amp; 4} \right]^3 + …$ <p>This still makes sense because I can raise matrices to a real number power by multiplying the matrix by itself <em>n</em> times. And in general, this infinite series will always approach a stable value - in this case, a stable matrix .</p> <p>The matrix exponential is useful in mathematics when we are trying to solve a system of differential equations. For example, let’s say I want to find expressions for $$x(t)$$, $$y(t)$$ and $$z(t)$$ given the equations below :</p> <p>$$\frac{dx}{dt} = a \cdot x(t) + b \cdot y(t) + c \cdot z(t)$$ <br /> $$\frac{dy}{dt} = d \cdot x(t) + e \cdot y(t) + f \cdot z(t)$$ <br /> $$\frac{dz}{dt} = g \cdot x(t) + h \cdot y(t) + i \cdot z(t)$$</p> <p>I can use a matrix exponential to find the coefficients of the functions :</p> $e^{\left[ \matrix{a &amp; b &amp; c \cr d &amp; e &amp; f \cr g &amp; h &amp; i} \right]t}$ <p>More generally, if I have a system of equations $$X(t)$$ and a matrix of coefficients $$M$$ then I can solve the following differential equation written in terms of linear algebra to find expressions for all the functions contained in $$X(t)$$ :</p> $\frac{d}{dt}X(t) = MX(t)$ <h2 id="remembering-e-and-its-derivative">Remembering e and Its Derivative</h2> <p>There is another expression that looks remarkably similar to the one shown just above. Specifically, the derivative of $$e$$ has the same form as the equation that can be used to solve a system of differential equations :</p> $\frac{d}{dt}e^{rt} = re^{rt}$ <p>(Keep in mind that we need to also take into account initial conditions if we want to find the solution to a specific system of equations .)</p> <p>Bringing it all together, it can be shown (check out the last part of ) that the derivative of the matrix exponential follows the same form as the derivative of e when raised to a real number, that is :</p> $\frac{d}{dt} e^{Mt} X_0 = M \big( e^{Mt} X_0 \big)$ <h2 id="showing-that-the-definition-of-a-matrix-exponential-is-correct">Showing that the Definition of a Matrix Exponential is Correct</h2> <p>Let’s take a simple example of how the matrix exponential is used to encode rotations, which we mentioned earlier is one of the reasons why they are so useful in robot kinematics. When we find a matrix that correctly encodes a given rotation, we will see that it is typically a skew-symmetric matrix which can be used to convert between screws and twists .</p> <p>Consider the matrix, $$\left[ \matrix{0 &amp; -1 \cr 1 &amp; 0} \right]$$. This matrix is a solution to the following system of equations :</p> $\frac{d}{dt} \left[ \matrix{x(t) \cr y(t)} \right] = \left[ \matrix{0 &amp; -1 \cr 1 &amp; 0} \right] \left[ \matrix{x(t) \cr y(t)} \right]$ <p>Geometrically, this expression indicates that the rate of change of $$\left[ \matrix{x(t) \cr y(t)} \right]$$ is tangent to the direction of $$\left[ \matrix{x(t) \cr y(t)} \right]$$ and has the same magnitude (this is shown in Figure 1) .</p> <p><img src="/images/2021-06-11-MatrixExps-fig1.png" alt="Fig 1" title="Figure 1" /> <br /> Figure 1 - Source </p> <p>But mathematically, why does this make sense? If we compute the Taylor series of $$e^{\left[ \matrix{0 &amp; -1 \cr 1 &amp; 0} \right]t}$$, we will find that each term in the matrix becomes an infinite sum with a specific pattern as follows :</p> $e^{\left[ \matrix{0 &amp; -1 \cr 1 &amp; 0} \right] t} = \left[ \matrix{ 1 - \frac{t^2}{2!} + \frac{t^4}{4!} - \frac{t^6}{6!} + … &amp; -t + \frac{t^3}{3!} - \frac{t^5}{5!} + \frac{t^7}{7!} - … \cr t - \frac{t^3}{3!} + \frac{t^5}{5!} - \frac{t^7}{7!} + … &amp; 1 - \frac{t^2}{2!} + \frac{t^4}{4!} - \frac{t^6}{6!} + …} \right]$ <p>And guess what? Those infinite sums are exactly the Taylor series for the sine and cosine functions :</p> $e^{\left[ \matrix{0 &amp; -1 \cr 1 &amp; 0} \right] t} = \left[ \matrix{ cos(t) &amp; -sin(t) \cr sin(t) &amp; cos(t)} \right]$ <p>This is also the expression for a 90 degree rotation counterclockwise with some angle $$t$$ . How cool is that? So now we have direct mathematical proof that the matrix exponential of $$\left[ \matrix{0 &amp; -1 \cr 1 &amp; 0} \right]$$ is exactly the 90 degree rotation matrix. This is why the matrix exponential is so useful in robot kinematics, because the matrix $$\left[ \matrix{0 &amp; -1 \cr 1 &amp; 0} \right]$$ encodes useful information about the rotation in a skew-symmetric form .</p> <h2 id="references">References:</h2> <p> “How (and why) to raise e to the power of a matrix, DE6.” 3Blue1Brown. 1 Apr 2021. <a href="https://www.youtube.com/watch?v=O85OWBJ2ayo&amp;list=PLZHQObOWTQDNPOjrT6KVlfJuKtYTftqH6&amp;index=6">https://www.youtube.com/watch?v=O85OWBJ2ayo&amp;list=PLZHQObOWTQDNPOjrT6KVlfJuKtYTftqH6&amp;index=6</a> Visited 11 Jun 2021.</p> <p> Murray, R., Li, Z., Sastry, S. “A Mathematical Introduction to Robotic Manipulation.” CRC Press. 1994.</p>In a recent post, we talked about some fundamental mathematical operations presented in a robotics textbook written by Murray, Li and Sastry. One of these operations was a matrix exponential, which was unfamiliar to me. It turns out that matrix exponentials are a really cool idea that appear in lots of fields of science and engineering. Thanks to a fabulous video by 3Blue1Brown , I am going to present some of the basic concepts behind matrix exponentials and why they are useful in robotics when we are writing down the kinematics and dynamics of a robot.