Great methodology delivers great theses

Adam Savage on science

I enjoy mentoring people and sharing my personal lessons learned. One of my key advices to become a (more) successful student and researcher with respect to writing excellent theses is to argue/reason well in your writeup: motivate your thoughts, justify your choices, and reason over your findings. As you not only make assertions or claims, but reason over their pros and cons and convince your reader by the power of argument (and references), you will position your work for highest accolades.

Here I want to put this into a greater context: the context of good research methodology.

It is an appealing (up to overwhelming) thought for most people to ultimately strive for a good result (in the meaning of outcome), especially in a graded thesis project. Having an excellent outcome (in ML: the model works, the prediction is correct, the experiment succeeded) is usually a good starting point for successful thesis, as it is usually hard for a supervisor to let you fail a project that worked out well. But an excellent result can cloud the view for what really matters in research & development: good methodology.

Bill Johnson was for sure not the first, but the most recent one to stress this point (follow this link if you like a good sermon - the reference is somewhere in the last third) for me: we arguably learn and grow the most in our hardest times, and how we personally manage problems defines the level of success we can be entrusted with. Transferred to thesis projects, this means: especially when your experiment is not bringing the outcome you hypothesized, when your model is not predicting well and nothing seems to work, you can show off your skills in doing great research; you can show how you systematically enforce progress where no easy option or helpful flash of inspiration offers you a short-cut for mere doing.

Of course, this is the hard way, and costs you dearly: where Churchill would have spoken of blood, toil, tears and sweat, for a researcher in AI/ML it rather means abstinence from the emotional height of apparent success, long working hours, and the feeling of abandonment when having no imminent idea of what to do next. While each case is individual (and the great grade in the end is awarded for you having found your individual way through), there are a couple of higher-level elements of a good methodology to progress anyway (compare Karpathy’s insight specifically on neural nets here):

Verify your code: Say you assembled a quite complex pipeline of scripts to load and preprocess data, train a ML model, and evaluate it on a public data set using some predefined metric. Make sure that every step works (with a simple experiment/test) before assuming your pipeline just works. Usually it doesn’t because you have a subtle bug (e.g., your input data has the wrong shape, so your model training actually never sees what you intend it to see; your scoring tool operates on class predictions that have an off-by-one error in the class number, so that the result flatlines; or, you get zero error on your testset - which is highly unlikely - because your algorithm actually saw the validation/test data during training due to a data shuffling bug; etc.). Good methodology is to test all elements (ideally in isolation) and verify that there is no stupid bug involved in your scripts.
Verify your assumptions: Especially those that you put into picking a certain pipeline or model. You based it on a certain publication? Sounds good, but how close actually is the use case there to your case and data - does the neural network architecture, for instance, really transfer to your problem? What kind of experiment could verify or falsify this assumption? How can you get a higher confidence in the truth of your assumptions?
Progress systematically: In doing such tests and verifications, it is of great importance to choose a systematic approach for the order and design of your experiments. Start as simple as possible (short pipeline, easy problem where you think it must work) and ideally from a known baseline - the goal has to be to have one working simple solution from which you can start exploring your more complex and novel ideas. Then, only change one thing at a time (e.g., the data set; the metric; a parameter in the preprocessing; a hyper parameter of the model) to be as certain as possible that any observed change in your result can be attributed to this change. This is how you collect evidence for your hypotheses.
Think much before trying once: Speaking about hypotheses - don’t start any experiment (don’t rush to activism in trying the next thing) without having a clear hypothesis you want to confirm or falsify with the result. Prioritize your options, check you assumptions (and code) for any not confirmed hypotheses so far (a hypothesis could for example be that a certain kind of model is a good idea; that more data would help; that a new idea will improve the state of the art; that an idea from elsewhere is transferable to your specific problem). Write down your starting position (e.g., data & hyperparameters - the part of the experimental setup you are going to change in potential subsequent experiments), idea (what is it that you want to try), hypothesis (what do you think is the result), and result (what came out; and what are your conclusions from it, leading to potentially next experiments to verify or falsify them) - for everything you try. Prioritize your hypotheses by feasibility (you may have limited resources in time, compute, understanding, code, data, …), probability of success, and interestingness/novelty (more interesting or novel results will contribute to a great outcome of your project while merely reproducing stuff from old ideas is not received very well even if it works). Thinking much before trying once is also important in the light of the time a single experiments may take (e.g., days up to nearly a week in deep learning and especially reinforcement learning).
Measure it: Measure your results, and make them comparable (things are comparable if they start in a similar environment, e.g. same data set, and you measure the same metric). See to have a single scalar metric that you can observe, but also measure most of everything else that can be recorded. Some further good and practical insights are in Andrew Ng’s upcoming book and this blog post on reproducing a deep RL paper.
Write it down: This finally comes to my introductory mantra - if you write down your results and interpretation in a convincing way by making plain to the reader the arguments you collected (see the last point), you will most likely have a great thesis. As Adam Savage puts it: “The only difference between screwing around and science is - writing it down”.

I would add (1-4) from above to Savage’s statement and jointly call this the said difference: “good research methodology”. If you demonstrate this, nobody will deny you great grades regardless to a large extent of your practical outcome. It is how you deal with arising problems that show off best your skills as a scholar, as an engineer, and as a practitioner.

Written on May 3, 2018 (last modified: April 29, 2019)