Hope you’ve been enjoying the interview content! Here’s Part 1 and Part 2.
Mike L: Going in a slightly different direction, a lot of your work focuses on player similarities and team similarities and team composition. You say that the model is not used as a player skill set appraisal. For instance, Devin Ebanks is in the same category as Josh Smith but, Josh Smith, most people would consider a much more talented player. Have you considered rolling in a skill element whether it’s PER, VORP, or some other type of player analysis tool on top of this topological method you’ve created to dig a little deeper?
Muthu: Yes, we have. The reason we often do styles because style allows us to find undervalued players. If it’s skill, it’s just going to tell you stuff you already know. By adding in skill level you just basically get a grouping of good players and bad players. It’s going to clump all of the good players together and we’re going to say okay – LeBron James, Chris Paul – they’re all going to end up in the same place despite their different styles. With style, it allows us to separate types of good players and mix them in with players who have similar styles. We just find more value in the style network. We’ve tried it with skill and it’s good. I just personally don’t see as much value in doing it that way but it definitely works.
Andrew: Yes, that makes sense. Let’s move on to the responses to criticism section of the interview. What are the shortcomings of using per-minute stats (the scoring rate, rebounding rate)? Do you plan to do it in a different way in the future analysis? Also, what do you say about the confounding factor being pace, that people who play at a very fast pace wind up with a higher rebounding rate and higher scoring rate, et cetera?
Muthu: That’s true. On the confounding question, it’s definitely true that when you do things per-minute, team’s pace is going to influence the stats that their players put out, but at the same time, the alternative way to do it would be to do it per-possession, but per-possession doesn’t account for a player naturally playing at a faster pace. If you use per-possession, you’re going to say these two guys both score 1.2 points per possession. One guy might play the game at an inherently faster pace than the other and might be quicker at scoring than the other. You don’t get a sense of that, you just get a sense for if you give them both one possession how much are they going to score. You don’t get a sense for how many possessions that they’re going to create for you in a certain amount of time.
Basically, you win some and you lose some. If you do it per-possession you lose a sense of how fast are the players going to play for you and if you do it per-minute then you’re going to lose a sense of how many possessions is the team naturally giving them which is causing their per-minute to be higher. It’s variable. We’ve actually done it with per possession. So leading up to South by Southwest, we ran the same network with per-possession and it looked pretty much the same. The differences really aren’t as big as people might think they would be. They’re pretty comparable.
Andrew: What about the changing the permanent stats into more commonly used per game metrics?
Muthu: Yes, you could do it like per 36 minutes or you can do it per game. Per game, it counts for skill more than style so we tend to stay away from that.
Andrew : The next question we had was Rob Mahoney of The New York Times called your model a novel execution of productive thought. But he cited the one-of-a-kind catch all category as a weak point. You kind of answered this when you said you forced them into groups, but when the model forced players into the groups did you guys have to do it manually or is there a tool that you guys use to somehow fit them in? What’s your response to that criticism?
Muthu: There’s a tool within the software called gain. Topology gain is a sense of the amount of overlap between bins that are being clustered. That kind of goes deeper into topology but essentially it’s a measure of how many connections you want to draw between nodes. The lower the gain, the more distinct the groups are. As you increase the gain, it kind of creates more overlaps between nodes. What we do is, if we see a lot of one-of-a-kinds, we usually increase the gain and what that’s going to do is it’s going to cause the one-of-a-kinds to be connected with someone else. Based on who they’re connected to we then use that as a sense for who they should be next to.
Andrew: It’ll map it to, you said, another person or the whole cluster itself?
Muthu: It’ll basically force that one-of-a-kind to be connected in with the larger network.
Andrew: The next criticism was from TrueHoop’s Tim Calvan. He said that the team configuration suggestions that you showed in your Sloan presentation didn’t describe how it created more wins. What’s your thinking around team configuration leading to scoring differential or win prediction? Do you have a response or was that something that you just entirely weren’t considering? Have you looked into whether higher performing teams are more balanced, et cetera?
Muthu: Yes, that’s a good question. It kind of goes back to the question about what grouping of positions work best. Again, it’s really tough because there is not one style of making a team is successful, there’s multiple ways to be successful. Some teams are really balanced, while the teams like the Heat are very imbalanced as a team yet they are really good. It’s hard to make categorical claims about team construction. In the example I gave it was basically one hypothesis that maybe diversity on a team is related to a team’s success. We haven’t delved too much deeper into that to prove that it’s true or false. I think it’s kind of an unfair question for us to even want to answer because it is so tough to really think about.
Andrew: That makes sense.
Mike L: A question I have stemming off of that: do you believe that there’s any position that a team absolutely needs in order to be successful or do you think that a team can get around the lack of say a paint protector with sufficient talent at the other positions?
Muthu: I think there’s some positions that are kind of stock positions that every team needs. Every team needs at least one or two ball handlers. There’s a bunch of different types of ball handlers but we need one of those. I think it’d be hard to be successful without a paint protector on your roster. The two-way All Star category is essential in that people are always saying you need a star to win. No team has really ever won without a superstar so I think you kind of need two-way All Star mostly. Mathematically, it’s hard to say exactly the formula that you need but I think there’s some that you definitely will need on your team.
Andrew: Back to Ayasdi’s work: the next question that I had was regarding applying the model to other sports. Since you already gave the example of football, are you allowed to give out hypotheses on how to use the topological mapping tool on football or is that private information?
Muthu: Yes, we probably can’t talk to you about specific football-related stuff but some of the stuff we do with basketball applies. For example, the way we handle drafting in basketball pairs even better over to football. There’s more data for college football players because they stay in college longer. The combine also gives them more data. We can do drafting, I think, even better in football than we do in basketball. We can do some injury prediction stuff even better in football, same reason. There’s some limitations in football, everyone’s doing a different job. Some people are blocking, others are just kicking, so on. That makes it a little tougher. But I would say a lot of our lessons for basketball do carry over to football.
Mike L: When you say we are you referring to Ayasdi?
Muthu: Broadly, yes. It’s their tool, but on the sports side I work with one or two other people who help me with some of the projects, that’s kind of what I mean by we.
Mike L: You formed a small team that focuses on sports analysis then.
Muthu: Yes, it’s a small team of two or three people.
Mike L: Sounds like you guys are doing some exciting work.
Muthu: It’s been fun for sure.
Andrew: Have you guys considered adding in on/off statistics? I know there’s a lot of stuff around plus/minus as well as on/off. What do you think about those statistics?
Muthu: They tend to work pretty well. When we’re trying to do a lot of columns and not just seven or 10, if we’re doing 20 or 30 or more we throw in stuff like plus/minus or adjusted plus/minus. When we’re doing smaller stuff we don’t just because plus/minus is confounded by teammates and it tends to throw off the analysis in smaller columns.
Andrew: Right, it would favor good teams.
Muthu: Yes, exactly.
Andrew: What about mapping players within preset age ranges to predict people’s career arc. For example looking at historical groups of sub age 24 players and looking at where they are at and how they figure within people’s development cycle? Is that something you’ve looked into or have you not really tried to weight people by an age curve type thing?
Muthu: We kind of do that with drafting. We do it by age group. We want to be able to predict career trajectories based on similarities by age points. Yes, it’s something we definitely do.
Andrew: Have you noticed any positional changes by people’s tendencies? Like someone who’s really high rebounding rate is more likely to become X position? Basically, the question is centered around positional changes. Have you noticed anything around that?
Muthu: Yes, we have, especially, again, in drafting we’ll see that college players who have a lot of assists tend to be really good corner three point shooters in the NBA, for example. Something like that, that’s not intuitive. You viagra online order wouldn’t think that assists translates to corner threes. I’m just using that as an example and I’m not saying it does but things like that we have found. So certain college statistics translate into certain NBA statistics that you wouldn’t intuitively suspect.
Mike L: How does that apply across college basketball? There’s a wide range of colleges and a wide range of talent, right? So playing in the Ivy League is not necessarily the same thing as playing in the ACC. Just the pool of talent that you’re facing is different. What sort of adjustments do you do in order to account for the difference in talent and the skill discrepancy?
Muthu: We don’t do a lot of adjusting actually because I think when you adjust you start using subjective modeling and hypotheses that make the adjustments. By doing so you’re skewing the data. What we do is we just put in the raw data and it turns out that players who play in the Ivy League or smaller conferences tend to have statistics that just look different. Their stat lines just look different from the guys from ACC because you tend to have one guy that leads the team in everything and some guys who don’t do anything. It’s just a different looking stat line and because of that, it’s kind of cool, you naturally find players in smaller conference grouped together. You’ll see Stephen Curry, Damian Lillard, and C.J. McCollum and all these guys in the same place. You’ll see the small conference guys together even though you didn’t make any adjustments for that. That’s cool because then you can map apples to apples; you can map small conference guys to small conference guys. It’s pretty cool.
Mike L: Then continuing off this question, let’s say you’re Jeremy Lin and you’re playing in the Ivy Leagues and you’re posting extremely good stats, wouldn’t your stat line be higher overall when you compare it to someone playing in the ACC as a freshman who it necessarily generating the same sort of volume just due to the talent he has on his own team?
Muthu: It’s true but it’s the same reason that Josh Smith and Devin Ebanks are together because, again, our network is more style based. Even though Jeremy Lin might have higher volume across everything, we’re looking at more of the distribution of the stats across rather than just the magnitude of them. In that sense, Jeremy Lin is not only grouped next to other guys in high volume stats, he’s grouped next to guys who have a similar distribution across the entire arc. Again, I think in a lot of ways it better not to make new adjustments. If Jeremy Lin does show up by himself because no one else is putting these kind of numbers up, that’s fine, I’d rather treat him as an anomaly than try to force him to be like Kyrie Irving or someone else just after my adjustments.
Mike L: The question tying off of that then, how do you determine which players are two-way All Stars because I don’t really have great insight into categorization. I would’ve assumed that it would be just players that have really high volume in certain stat categories. How did you actually come up that category?
Muthu: The same way I talked about how we came with every other position, we used the statistical tables and K-S scores to come up with how each position is different from each other position. With two-way All Stars we just saw that in terms of every single column they were just better on both offense and defense and every other position. Maybe except blocks or rebounding. For the most part, they were just way above average.
Andrew: You mentioned there’s a way to avoid or plan for injuries. Are you allowed to share any insights on how you approached that? I wouldn’t even know what columns to consider maybe outside of minutes played or something like that.
Muthu: Yes, unfortunately, I would love to but I can’t go into details about that.
Andrew: I had some miscellaneous questions. You mentioned in your Sloan presentation that Jeremy Lin got compared to DeMarcus Cousins, which I found surprising because he’s a post player. How did that grouping work out? Do you remember specifically? That’s a specific question I had.
Muthu: We used a lot more statistics in college to group the players. I believe DeMarcus played one year in college so his data was pretty sparse, it wasn’t very full. Jeremy Lin played a lot more. Again, in college the hard part is that the talent is varied and so huge so if you’re good you’re often grouped next to other good players even if you guys are totally different. You will have post players next to wings just because they probably do good in certain categories. Since that network we’ve gotten a lot better at doing college networks and we don’t have post players next to guards anymore. But back then that’s just the way things clicked and the way we rationalized it was just that Cousins didn’t have a lot of data because he played for such a short time and they both were putting up pretty big numbers across the board.
Andrew: Where do you see basketball analytics going in terms of tools that can be applied outside of topological mapping or was this one you just happen to hit gold working for Ayasdi. What do you think?
Muthu: I got pretty lucky there. There’s a lot of tools. Not just in medicine but in finance, in energy, in weather prediction. Every industry is dealing with data analysis now. It’s always cool to see how other industries handle data and trying to carry over some of the same lessons to score on sports and vice versa. I don’t think this is the last tool we’ll see cross over from one area to another. I think there’s going to be many, many more. It’s a matter of the size of your data and what other industries are doing. In terms of basketball analysis, I think the biggest challenge is still going to be moving from analytics to action because there’s still a huge wall between the front office and analytics and then getting to the court and influencing how an NBA player is going to play in the heat of the moment. When his emotions are high, when he’s playing a game he’s played for 23 years, why is he going to do something differently because of what some guy with a calculator told him. It’s still pretty uncertain so I think that’s really where the challenges is now – how do you get those analytics to turn into actual results.
Mike L: I think we have time for a couple of not so serious questions now. Do you have any predictions for this upcoming season?
Muthu: I tend to stay away from predictions, I don’t know.
Andrew: How many wins do the Rockets get, 55? I’m hoping for 55.
Muthu: I think the Rockets are going to do well. I’m pretty high on them.
Andrew: I’m pretty bullish too.
Michael N: Did you ever use draft measurements, like height, arm span, standing reach, stuff like that because I know that you try to stay away from pigeon holing someone just because their measurements don’t match tradition but I think there’s a level of validity to say that well if you’re not this tall you probably shouldn’t be playing a certain style.
Muthu: We have access to all the pre-draft data – bench press, hand size, and so on. We’re working on it now again on the private side with different partners. Yes, we’re looking at that stuff for sure.
Andrew: Have you guys been using historical data more because if you don’t normalize for pre-1980s their pace of play is much faster than it was today. Have you been using historical data and has that come into context more with works for teams or is it more of within today’s world?
Muthu: The historical stuff we do is more for fun than it is for teams. It’s more for media insight and entertainment. We do run into an adjustment problem, a pace problem, but we kind of live with it.
Andrew: How much of a time commitment is your work with Ayasdi?
Muthu: It fluctuates. Over the summer, it’s over 10 hours a week. Then this winter it’s probably a little under 10 hours a week. We have other people working on the same stuff now, like I said, on a team. We’re able to get stuff done still.
Andrew: Thanks for your time. I appreciate your time in taking time out of your busy schedule. I know Medical School is pretty busy. What’s your med school life like?
Muthu: It’s not bad. Monday through Friday good amount of class but it definitely doesn’t kill you.
Andrew: What are your plans for the future in terms of sports work or are you going to be a full time doctor?
Muthu: I don’t know, it’s still to be determined. So far, I’ve been able to do both because of proximity and Ayasdi’s down the street from Stanford Med and the time commitment is okay on both sides. It’s been fine so far. At some point, I will have to figure it out but for now I’m just enjoying it.
Andrew: Is it mostly people your age at Ayasdi? Are the founders Stanford alums?
Muthu: The CEO is maybe 30-something. He’s a former PhD graduate at Stanford and the other two co-founders are math professors that have been teaching math for decades so they’re on the older side.
Andrew: Thanks for joining us.
Muthu: Thank you guys. Good luck with everything you are doing.