Published in May, 2018
I have been at Microsoft Research Cambridge for the last three months working on my master thesis. Today is my last day here and I'm writing this to wrap up my experience. It was a great time and I learned a lot, got acquainted with plenty of great people and just had fun.
Below I will write about the lessons I've learned during my work there. Some of them might look obvious to you, some of them were obvious to me before, but I was too lazy to apply them in practice. But after crashing my head against the wall several times, I've decided to be smarter. Here you go.
Time is precious. Especially your time. Don't spend it on doing dumb things that your computer can do. Computers were created to do the stupid routine tasks that human are bored to death with.
Try to build a fully automated pipeline for the experiments. From the data loading/cleaning till the results analysis. For me, two most useful things were using Docker and emailing experiment results to my inbox. You do not spend much time for experiments replication and you are not tempted to eyeball the logs until they are not ready. When you get an email, you can read the logs and see the plots from any device you have.
Sometimes it's very tempting to quickly change the code without thinking much about the consequences. Sometimes it's very tempting to do math or keep notes in a sloppy way. But this is a road to hell. You will spend time on doing this first time, and then you will spend time on fixing this or doing the same thing again properly since you cannot find the results or they are so messy that you have no idea how to use them. And this chaos will only grow.
How to fix it? Easy! Do everything as if it goes into production. Yes, there should be a healthy balance between being too pedantic and being too sloppy. But try to do everything as if you have to submit your work tonight and you do not want to be ashamed of it.
This lesson is closely related to the previous one. You have no right to do same thing twice. You have no right or time to repeat the experiment because you've lost the results. Keep a log in Google Docs/One Note or whatever and keep them in order. Future you will be grateful to you for that.
Save all the experimental data, all the results and all the models. Save all the logs. Do the backups of that stuff. HDDs are not so expensive nowadays. Time and your sanity are more expensive than that.
If you have to wait for several days for the results of the experiments, start today. Do not let your computer stand still because you haven't prepared the experiments to run. Create a plan of the experiments and stick to it. Solve all the problems that do not let you start an experiment. And then do all the other stuff while your CPUs/GPUs are burning. You definitely will not stick to the original plan, but you will change it in the future and stick to the new one.
I'm 99.99% sure that the problem you're working on is not fully a new problem. People have already thought about it before. Tell people around you what you're working on. Either they have thought about something similar before, or they are new to your problem and will give you fresh ideas on it. Ask them about their problems, may be you can help them.
It's not only about help, when you explain things to people, especially if they're new to your problem, you're ordering things in your head. And this will help you to understand stuff more deeply.
Debugging in programming is painful. But debugging in machine learning is much more painful since sometimes you have no idea if it's your code that's buggy or it's your assumptions that are false at some step.
The second type of bugs are much more tricky, but you should at least avoid the first type. Check every step and be sure that everything goes as you want it to go. Visualize every step and check if the model has the correct data you expect it to have. Check that the model is the model you want and the hyperparameters are those you want to use. Visualize and save everything (yes, Lucas, now I'm really doing this).
A lot of people are obsessed with their productivity. A lot of people publish the tools they've built for themselves. You just need to spend some time for looking around and learning new tools.
It definitely worth it. If you use vim, check the latest IDE for the language you use. If you use IDE, check vim. If you save the papers in a folder, try Mendeley or any other similar paper manager. When you write a paper together, try Sharelatex, it has a perfect edit history tracking. There is a lot of useful stuff, just look around.
That's it. I hope it might help somebody not to repeat my mistakes and be more effective in their research. As for me, I have a thesis deadline at the mid of August and then I will be on the market searching for a PhD position.
Thanks to Sebastian Nowozin and Katja Hofmann for supervising and all the others for making these three months a great experience. Special thanks to my wife and the son who were patient enough not to see me during work days. And thanks to my parents who support me in everything.