I've mentioned before that I work as a data scientist. I've been at it for a long while now. Long enough, in fact, that they interviewed me at my job about what it was like to be me, and published the interview internally as part of their "technical career storyline" feature. The following are the contents of that interview, reproduced with permission and some minor editing.
I was teaching various subjects at various levels - One of my proudest achievements in life is that I've taught students at every level, from preschoolers to graduate students and everything in between. Before that, I was kind of lost after dropping out of grad school for physics.
I then discovered data science as a field - before, I very much thought of coding as one thing, and science as a separate thing. When I found out that there was a field that combined these two things that I liked, I decided to go for it - I went through a data science boot camp, and here I am.
I'm on the Data Science team. I'm currently working on getting SEO experiments on our experimentation platform. More generally, I've worked on experimentation, metrics, data, and how to get them all to play well with one another.
There's two aspects, actually, and it's their combination that I enjoy the most.
The first is that I feel like I'm still engaged in the pursuit of capital-T "Truth". Einstein said that he wanted to 'know the thoughts of God'. This is what motivated me to get into physics in the first place, and I thought I had to give up on it when I dropped out of the field. But now, I think that my primary task is to think soundly about data - and that, in the abstract, is a profound activity, no less connected to the "thoughts of God" than the "big questions" of physics.
The second aspect is that I get to help people, on a near-daily basis, at every level - whether it's providing answers to the stakeholder's questions, mentoring new employees, or improving our product for our consumers and businesses. It's a very direct, practical kind of help, where I can easily see the effects of my actions, which I like immensely.
I think it's a rare job that fulfills both of these aspects, and I'm lucky to have found one.
I don't know if I have one - I don't know that I'm the best coder or the best statistician or the best communicator on the team - but I think being well-rounded may be a superpower on its own.
There's this joke about grad school - that you're there to learn more and more about less and less, until you eventually learn everything about nothing. Part of my difficulties in grad school was that I didn't want to be that person, who knew everything about nothing. I instead wanted to know the most important thing about everything.
Of course, when you're young, when you're still in school, it makes sense to focus on what you're naturally good at, to make the fastest progress. And we do want everyone to meet the minimum technical standards for whatever task they're working on. But in the end, I think your effectiveness is derived from the product of your dimensions rather than the sum, meaning that you may easily benefit a lot more from getting your communication skills from 0 to 1, than from getting your math skills from 9 to 10. This does require a willingness to seek out your weaknesses, and work on what doesn't come easily to you, but it's a superpower that's available to everyone.
I'm combining these two questions into one.
Donald Knuth was right. Premature optimization is the root of all evil.
Long ago, I noticed that we were writing very similar SQL queries in a lot of our analysis, and built a system for writing these queries. That might have been okay if I kept it simple, but I wanted to make it more complete, and so it became more complicated, until I was building almost a new way of writing SQL. In the end, I think I used it myself a couple of times, and I don't know if anyone else used it. I believe it eventually just got deleted in the latest clean-up of our repo. I've heard of other people doing similar things, and not liking the result. Basically, you shouldn't try to invent a new way of writing SQL.
I think a lot of data scientists and engineers naturally like building stuff, and so are prone to fall into this "premature optimization" trap. It's easy to be seduced by the potential of a system that you think will do everything - if only you could improve it a little bit more, or if more people would use it! But I've found that such complicated, overarching systems have lots of hidden costs with it, and it could become entirely unnecessary if the ground shifts from underneath it.
Nowadays I like simple, lightweight systems, and am wary of building anything too complicated unless it has a ton of buy-in behind it already.
I think of the group decision making process in two steps - investigation and integration. This lines up with the data collection and analysis steps in an experiment, and can be loosely associated with the proposal/buy-in process.
During the investigation step, before the proposal, I'm just trying to map out the problem and solution spaces - and other people are the best source of getting this information. But I try very hard to NOT influence anyone's opinion, so that I get their independent, honest thoughts.
I then try to integrate, for myself, the results of the investigation, weighing everyone's opinions including my own, seeing how everything would fit together, and making adjustments as necessary. If, afterwards, I'm convinced that the problem is real and the solution would work, I would make a proposal out of this solution, at which point the "integration" step goes outside myself and becomes public.
Ideally, all the stakeholders would give you their buy-in freely and independently, but in practice I've found that you have to violate the independence principle a little bit here, and actually ping people to solicit for buy-in and convince them. Fortunately, if you've been fair and thorough in your earlier step of integrating the solution for yourself, this should not be too difficult, and your solution should be acceptable to most people.
It's not often that I've had to use this whole process, but I try to stick to it as closely as possible whenever we need to make a group decision. And I think the principle here is sound: to get the integrated solution of independent opinions. Fundamentally, I think that this principle is undergirded by the central limit theorem, which is probably the most important theorem in statistics. Similar principles are used in all kinds of group decision making, from experimental analysis to democratic elections to interview evaluations to jury trials to online reviews.
Confidence intervals! They give you a great deal of understanding for how easy they are to understand. A lot of experimentation and the fancier calculations are just playing with confidence intervals and p-values. I think everyone who looks at any experimental result should at least have a basic understanding of confidence intervals.
I've found that it's very helpful to keep a sizable personal backlog of what I think I would enjoy, and to be somewhat patient with it. For me, this backlog has gotten big enough to the point that I can now incorporate at least some aspect of some of those projects to a lot of the "important" tasks that people ask me to do, so that my work often ends up being a blend of what I enjoy and what others find important. And occasionally, I get lucky enough that an "important" problem for someone else has a direct solution in my personal backlog, and I get to pitch and work on what I most enjoy.
The "patience" component is important. I would, of course, put things in the backlog that I think are important or enjoyable. But if I am mistaken - if it turns out to be not that important or enjoyable - the passage of time will let me get a second look at it later, and allow me to cull the backlog to the really important or enjoyable projects. So when I eventually get to one of them (which is often, since the backlog is large), I can frequently say, "yes! I've wanted to do this for years!"
I have two tips:
I go on a small hike (just 30 minutes) up a hill just outside my house on most days, and I enjoy it a lot. It never feels good to just stay in the house the whole day. The fresh air and light, getting some exercise, and seeing people - all these things are things I didn't know I missed while I was missing them. And the pictures of the occasional wildlife I take while on the hike are often a hit on our team's casual slack channel!
The second tip is to invest in your personal productivity for your home office. If you multiply out the value of your time over a year with something like a 5% increase in your productivity, you'll see that even a relatively "small" increase like 5% justifies a fairly large expenditure. For myself, I've always liked bigger, higher resolution monitors, and I haven't regretted it yet.