The Depth Project
Last month I attended a conference on computational modeling in games in Banff. It was a great opportunity to jam on a research question that I’ve been working on with NYU colleagues Andy Nealen, Julian Togelius, and Aaron Isaksen. We were able to share our thinking and get input from a lot of super smart people, especially Alex Jaffe, of Spry Fox, who it turns out has been working independently on this same question for a while and had lots of very useful ideas to contribute.
The question, in a nutshell – can we discover a well-defined property of game systems that corresponds, roughly, to what designers and players mean by “depth”? We’re talking about depth in a very narrow sense, the depth of abstract strategy games, not the broader, more ambiguous sense of depth as applied to thematic richness and emotional nuance. We are only looking at the very focused domain of game-as-problem. What allows some problems to provide a lifetime of study and continually provide surprising and interesting things to think about the more you think about them? Here is a video of the brief introduction to this topic that I gave at Banff. (Not sure why they added “and fun” to the title, we aren’t really interested in that!)
What follows is an overview of where our thinking is on this topic. This is a reflection of the thoughts of all the people mentioned above, filtered through my own perspective, so credit to them for anything clever and I’ll take the blame for any blunders. Eventually this will turn into a formal paper, but in the meantime here’s where we’re at:
Why Depth?
First of all, it seems to us to be an obvious question to ask. There are well-defined formal properties of game systems that can be analyzed and measured, things like state space and branching factor. We often talk about depth as if it were a property like this, but is it? Is there some precisely definable, objectively observable property that can tell us something interesting about a game’s underlying structure?
This is not a normative project. We don’t want to make claims about what structure games should have or what makes a game good. It’s very important to make this caveat repeatedly and clearly, because of how, in casual usage, “deep” is often used as a general superlative.
Similarly, we’re not looking for a threshold, a specific limit which separates “deep” games from “non-deep” games. Instead we are looking for some measurable quality, we might call it “d”, which would be present in all games, just to different degrees.
We’d also, if possible, like to sidestep the issue of psychology. We aren’t primarily interested in how humans learn or what humans find interesting. We want to know if we can make observations about certain properties of a game’s formal system that are true regardless of the particular details of human cognition. True in a way that would be recognized by aliens, in the same way that observations about state space complexity and branching factor are true.
If we are able to identify such a property, it may provide some insight into questions of artificial intelligence and machine aesthetics. If we can describe what makes certain problems interesting in this particular way then we may learn something about “interestingness” in general, about the difference between data and meaning. Deep games require knowledge, explanation, creativity, and cleverness. Modeling depth may help us better understand how to think about these things.
Such a model could also inform our discussions about game design, suggesting directions for design exploration and expanding our palette of analytical tools and techniques.
However, for me personally, the primary purpose of this project is pure knowledge. I want to know if there is some specific structural quality that is shared by Chess and Go and Poker and Bridge and Magic: the Gathering, some recognizable topological feature of their state space terrain. And if there is then I want to see it, and be able to talk about it and reason about its ramifications.
What depth isn’t
I’ve mentioned state space and branching factor several times already. Could it be that one of these features simply is the property we’re looking for? Or some simple combination of the two?
The reason these features aren’t good candidates for this property is that it’s easy to imagine counterexamples – you can take any game and inflate its state space by adding lots of extra states that the game can be in. If these states aren’t relevant to the problem of figuring out how to win then they won’t provide any of the qualities we are looking for – they won’t make the game more challenging and more interesting to study. Imagine a version of Chess where after you move a piece you can choose to flip a two-sided token to either its blue side or its red side. You’ve doubled the state space of Chess (there are now two distinct versions of every position the game can be in) but you obviously haven’t increased its depth. You can run the same thought experiment for branching factor, adding a bunch of pointless choices that don’t truly matter to the outcome of the game.
What we’re looking for is meaningful state space, meaningful branching factor. But this begs the question – what, after all, makes game states and choices meaningful? It is our assumption that state space and branching factor are necessary but not sufficient to explain the elusive property we are in search of.
Likewise, the more general property of computational complexity is clearly relevant but does not by itself do the conceptual work we need. You can have a game whose solution is arbitrarily difficult to compute without guaranteeing that the game will have the kind of features we are looking for – the capacity to reward life-long learning and support a large, long-term community of serious, dedicated players.
Evidence of depth – the skill chain
So what exactly are these features? Can we get more precise about them?
A key concept in addressing this question is the idea of the skill chain, as articulated in Characteristics of Games, by Elias et al. The size of a game’s skill chain is the number of distinct steps in the ranking of all players, where players at each step beat all the players at lower steps some significant percentage of the time.
Why is this important? Well, what it indicates is a game in which a player can improve through study, the more you think about the game the better you get. Players of a game like this are on a journey of gradual and continuous improvement, ascending ever-upwards towards better understanding and stronger play. Trivially easy games can’t support long skill-chains, and neither can all-or-nothing puzzles, no matter how large and difficult they might be.
In order to make this quality more precise and measurable, we imagine a skill chain made up of different algorithms called strategies, each strategy is a complete set of instructions for how to play the game including when to search ahead in the game tree, how to evaluate board positions, what rules of thumb to apply when making decisions, when to just guess randomly, etc…
The more complex these strategies become, the more computational resources they require to execute – time and/or memory, and the better win rates they achieve against simpler strategies.
Using this picture, we can frame “d” as the capacity for a game system to allow for a ranked population of strategies that provide partial/approximate solutions. The more discrete ranks in such a population, the greater the degree of this property within the game. In other words, the more a game has this property the more it is susceptible to a sequence of partial solutions, each one more complex, requiring more time and/or memory to execute, and providing a better approximate solution to the problem of the game.
On the one hand, this is really just a precise re-statement of the original problem, we’ve just clarified the thing we went looking for in the first place. Then again, such a clarification is actually a very useful conceptual tool. For example, we are actually starting to build skill chains of this type for very simple games and observe how they change under different conditions.
Search and Heuristics – an Information Theoretical Approach
But this feature of a game’s skill chain is still just a consequence of d. We still want to understand what is the underlying property of the system that leads to this result.
One question to ask is – which kinds of problems lend themselves to partial/approximate solutions and which don’t?
We know that every game of the type we are looking at has some theoretical perfect strategy, some algorithm that tells you exactly what move to make at any point. In fact, we know how to produce this solution in theory (although it is far from practical in any substantial game): As long as it is a game of perfect information, you can calculate out the entire game tree, find a winning node and work backwards looking for a path that your opponent can never force you off of.
Imagine a game in which this were the only way to play. Whatever your position, the only way to do better than choosing a move at random is to look ahead in the game tree, calculating move by move until you see the necessary outcome of each move. A game of this type would only have a single dot on the skill chain graph, there’s only one possible strategy – perfect play via raw look-ahead calculation.
What’s missing from this type of game is heuristics – the rules of thumb that players use as a shortcut through the intractable problem of fully searching the game tree: “control the center” and “doubled pawns are bad” in Chess, “hane at the head of two stones” and “empty triangles are bad” in Go.
Heuristics take advantage of regularities in the game tree to guide the player towards areas of state space where winning paths are statistically denser. It’s not true that every Chess position in which you control the center squares leads to a win, but it is true that, on average, these positions contain shorter and more certain paths to winning outcomes. In a sense the heuristic operates as a kind of compression function on the game tree – a map that reveals structural features in the underlying tree. A fuzzy, blurry, tattered map, but a map nonetheless.
In fact, in some ways, heuristics simply are the partial strategies we are talking about. The more heuristics a game can support the more you can combine them into better and better strategies and the more you can improve with effort and study.
But wait – it’s not quite that simple.
Imagine a game in which there was a heuristic that allowed you to play perfectly. In such a game you would never need to search the game tree at all, whatever the situation you would simply apply this one weird trick. This is the opposite way in which a game can fail to support a long skill chain, by having a simple strategy that exploits an underlying regularity of the game tree to make the problem trivial.
Both of these theoretical depth-lacking games – the one that requires pure search, and the one that allows for one simple heuristic – share an interesting property. In both cases the description of how to find the best move is short. In the first game, the description is “search the entire game tree and find the winning move”, in the second game the description is “always move downhill” (or whatever). Even though the process indicated by the first description is impossibly difficult and that of the second trivially easy, the descriptions themselves are both short.
But in deep games the best description for finding the best move is long and complicated. It will require a mix of heuristics (you probably want to move downhill here) and raw look ahead (actually you need to go uphill here because look – otherwise this bad thing will happen).
We now believe that the quality we are looking for is a kind of semi-orderedness of the state space, a level of underlying structure that allows for some useful compression of the game tree but not so much that raw search is never needed.
We can frame this in terms of entropy. The best way of looking for something, like a best move, depends on how ordered the space you are searching through is. If the space is completely disorganized then there’s no better way to find the item than to examine every possibility. This corresponds to high entropy. If the space is well-organized I can give you precise instructions about where to look. This corresponds to low entropy.
high entropy = weak heuristics, all search
low entropy = strong heuristics, no search
The description of the best way to find something in a semi-ordered space is long and complicated. Deep games are semi-ordered. This is what the literature that a deep game produces is made of, all the strategy books and blog posts and videos and forum arguments are part of this lengthy description of the most efficient method for finding the best move in a semi-ordered system under conditions of computational constraint.
Final Thoughts
If this is the right direction for thinking about this issue then playing a deep game will involve a complex dance between heuristics and pure search. And sure enough, listening to the real-time thought process of an expert player often reveals just this – the compressed knowledge of proverbs, patterns, and rules of thumb alternating with periods of raw, if/then, move-by-move calculation.
Consider also the “killer move” – the unexpected, spectacular, high-impact play. What makes a killer move surprising and impressive is the way it deviates from a strong heuristic, indicating that heuristic’s limitations. (Watch, for example, Michael Redmond doing a triple take in response to AlphaGo’s 4th line shoulder hit in game 2.) Killer moves demonstrate how raw search can outperform even the most powerful heuristic. How to avoid the trap of a local maximum in solution space? Search!
Speaking of AlphaGo, it is interesting to note that the architecture of the world’s best game-playing AI reflects this same search/heuristic balance, its Monte Carlo tree search uses heuristics to channel the raw power of pure look-ahead, efficiently searching the game tree for directions with the greatest average value, and its board-evaluation functions are heuristics baked into its neural net, patterns discovered through millions of real and simulated games.
The human brain itself may demonstrate a similar structure. In Thinking Fast and Slow psychologist Daniel Kahneman outlines the two main modules of human congnition – system 1 (snap judgments, intuition, reflex) and system 2 (slow, step-by-step, logical calculation.) System 2 corresponds to the theoretically perfect but impractically slow process of pure search, while System 1 corresponds to the compression of reason into powerful but flawed heuristics.
If you allow me to get even more vague and philosophical for a moment, I might even suggest that we see something of the same dynamic in the contrast between two of the world’s most influential systems of moral judgment – utilitarianism (raw search) and Kantian deontology (heuristics).
Now that I think of it, even Hearts and Minds, my own earlier foray into reason and emotion in games seems, in retrospect, to express this exact same idea. Weird.
Anyway, that’s roughly where we are with this project, minus a lot of related issues and specific details. We are looking to put this research into a published paper sometime in the near future. Stay tuned for further developments.
A delightful discussion of depth. Before I get into specific comments, I’m happy that this level of discussion is occuring.
Prediction A: Any rubric for depth that does not include a psychological filter will always result in examples that are interesting for a machine, but uninteresting to a human.
There are mathematical structures to games but in games that humans choose to play, these are always filtered through human minds. Arguably that is a critical element of any game once we get beyond the rarified world of machine vs machine simulations. Thus you need a person to determine: what is meaningful and why it is meaningful. You also need a person within any equation because they bring cultural resources to bear that fundementally alter which solution set they, as humans, explore.
So perhaps there is a mathematical construct of ‘d’, but it will always be a superset of ‘d(human)’. And if you are making games for humans, that boundary is fuzzy and ultimately dependent on the history, culture, communication systems and skills of the population playing the game.
Prediction B: For certain populations, a game greate d(math) will be seen as less deep when compared to a game of lower d(math).
A trivial example is Chutes and Ladders vs Chess, which feels like it has all sorts of interesting decisions for a 4 year old, but none for an adult. Children get quite far down the skill chain coming up with all sorts of heuristics on how to play. Where chess isn’t even a valid game for these same players. The entry barrier (for many children) is too high. But most examples are non-trivial and the boundaries are again fuzzy. We end up regularly in situations where ‘percieved d’ does not correlate with math d.
As an aside, I used the term skill chain back in 2006 to talk about connecting together skill atoms. These are allow for the encapsulation of strategies / heuristics in higher level skill atoms. From a computational perspective, this structure is easy to deal with since it can be represented by a directed tree. (http://www.gamasutra.com/view/feature/129948/the_chemistry_of_game_design.php?page=4)
Another area if inquiry: When I see heuristic vs search, economic costs and satisficing immediately come to mind. I suspect that there’s an interesting relationship between the cost of search and the ability to store pertinent heuristics for later use. Many of the game we consider ‘deep’ certainly allow for the clever breakthrough by an individual, but they are also highly communal research activities once you get outside of an individual match or competition. A rich library of heuristics is the result of thousand (millions!) of players collecting and sharing knowledge within a larger community.
Again, from a more abstract perspective, the community acts as a large memory buffer to preserve learning. Perhaps games that lack this storage facility will never actually be seen as deep no matter what the properties might be internal to individual session.
All the best,
Danc.
The ideas behind this conference seem to echo the definition of depth I developed independently years ago. https://critpoints.wordpress.com/2016/05/01/critpoints-glossary/
In my definition, Depth was a more limited selection of game states out of the total state space, removing redundant states (like the chess + two sided coin you mentioned), and those that were irrelevant to the playerbase (reflecting the tendencies towards dominant strategies that limit the game, as well as developments in the metagame that reveal more depth, ostensibly making the game deeper, and to rule out states that are beyond the limits of human play, such as arbitrary code execution in most cases, or playing a real-time double blind game with perfect reaction time). I defined it this way because it was more objective and quantifiable than other definitions I’ve seen, though it’s still up to a degree of subjective interpretation (it’s hard to say which states are redundant in any algorithmic sense or to track which states become invalid or valid because of metagame shifts). I also notice that elements of my definition pop up in everyone else’s definitions for depth, all the definitions point to very similar things in nature.
As presented in the reaction time example though, a game like street fighter only has depth in relation to the humans that play it. A large component of many games is their double blind element, which are based on the fact that humans have a reactionary blind spot of about 13 frames (215ms). This might not be evident to aliens, it’s certainly not evident to SF Alpha 2 Akuma, who will walk up to you and throw if you block, dragon punch if you attack, and block if you dragon punch.
What counts as redundant may be obvious in the case of these very digital board games (rotations of the same board, multiple states that evolve into the same state, and so on) is also a bit fuzzy when it comes to games where the position of each character may be a floating point number or an integer represented on so fine a scale that it might as well be a floating point number. There’s clearly a lot more possible positions and rotations for players on a given Quake 3 map than all the possible moves in Go, but not as many of these are meaningfully distinct as each possible move in Go is, so a lot of these states are clearly redundant, but judging which precisely is completely impossible.
Something that probably deserves more investigation and research is the process of developing heuristics over the course of improving at games. How do games scaffold the learning process and enable players to play effectively at each level, while still having a more complex heuristic waiting in the wings? How do we anticipate places where the next step up might be beyond the player’s level? How do we construct games that can continually cause players to reevaluate how the game is supposed to be played? I felt Super Smash Bros Melee does a good job at this in comparison to Street Fighter.
Oh, and good job pointing out that a raw game tree lookup is low skill.
“Imagine a game in which there was a heuristic that allowed you to play perfectly. In such a game you would never need to search the game tree at all, whatever the situation you would simply apply this one weird trick.”
GAME THEORISTS HATE HIM! Local munchkin discovers secret to playing perfect thanks to this one weird trick! HOW? …Just watch this free video>
Nice comments Dan, thanks.
>> Thus you need a person to determine: what is meaningful and why it is meaningful.
Sure – although, in a way, this project is asking the question whether, in this very narrow and specific domain, there is a way to talk about meaning that is independent of human psychology in the same way information theory talks about “messages”.
The problem, again, is that “meaningful” is often used as a superlative to mean “worthwhile, valuable, good”. Historically, claims about applying math to say anything useful about those things don’t have a great track record, and we are right to be skeptical of them. The degree to which this project will avoid similar mistakes? We will have to see!
It’s impossible to avoid the context of AI here. Is generally intelligent AI possible? If so, we will need a model of meaning, and meaningfulness, that isn’t bound to the particular details of human psychology.
Likewise, if the human brain is a universal turing machine (which it probably is, right?) then at least *some* of the things we find interesting are things that simply, objectively *are* interesting, in a way that would be recognized by any thinking being, anywhere.
>> Chutes and Ladders vs. Chess
Right, the whole point is to articulate the obvious difference between these things even when it is hard to see.
>> We end up regularly in situations where ‘percieved d’ does not correlate with math d.
I will tell you that some of the folks on this project are more interested in perceived depth than I am. For me the whole point is to understand the deep structure of interactive systems and their capacity to support certain kinds of problem-solving. If we happen to have preferences for one or another of these systems based on the fact that we have fingers, or grew up on the savanna, or hate spiders, or whatever, well… this is certainly relevant to me as a game designer but it’s just not the point of this research.
Mind is mind. Game recognize game. Ball don’t lie.
Chris –
Yeah, I think we are on the same track here. Our approach is to isolate these attributes in very small “artificial” games where we know what perfect play is and can fully map the space of different heuristics/strategies, make observations in this micro domain, and then extrapolate these observations to bigger, real-world sized games.
I would like to comment just on the sub-topic of what constitutes a meaningful decision vs. a non-meaningful one.
For a decision in a game to be meaningful, it has to have the following aspects.
First, it must have an effect on the outcome of the game. There are many games which require players to make decisions that do not matter. These are usually decisions which are related to the theme of the game, which we are ignoring, such as whether to play as the iron or the hat in Monpoly. Meaningless decisions most frequently, but not exclusively, occur in games that have states where the result is already decided, but the victory condition has not been met. If I have a mathematically insurmountable points lead in Carcassonne, but there are only a few tiles left in the game, it may not matter at all where they are placed. The result of the game will be unchanged, regardless of what decision is made.
The second thing that is required for a decision to be meaningful is that it must be non-arbitrary. In rock paper scissors, all you do is make a single decision, and that decision matters a lot. It determines everything. However, it is an arbitrary decision. All three options are exactly equal. Picking randomly is in fact better than using any other decision method.
The third requirement for a decision to be meaningful is that the effect on the end of the game must be significant. The more significant the consequences of choosing wrong, or benefits of choosing right, the more meaningful the decision is. A decision that could potentially result in a ten point swing is much more meaningful than a decision that only causes a single point difference in score.
Lastly for a decision to be meaningful, it has to have many non-arbitrary alternatives. And this point is where I think we begin to see the concept of depth emerge. If a player’s choices at a particular juncture in a game are exactly equal, then we have a meaningless arbitrary situation. What we are looking for here is a scenario where after applying available heuristics there are multiple non-equal options available to the player that are non-equal, but still lie upon the Pareto Frontier.
Option A is stronger and slower. Option B is faster, but weaker. All other options can be ruled out due to having inferior Pareto efficiency. Regardless, A and B are not exactly equal, but neither can be ruled out. The player must now make a difficult and meaningful decision between them.
When there are a larger number of viable options available to a player in a particular meaningful decision, we can perhaps say that decision has more depth than another decision with fewer choices. However, as much as you may want to, we can not ignore human psychology in this aspect. When there are more available valid options than a person can arrange in their mind at once, the decision becomes arbitrary and loses all meaning due to the near-impossibility of making the decision in a meaningful fashion.
Side note: Games appear to have more depth when they offer players the opportunity to make players that were non-obvious to almost all other observers, but in hindsight were very strong plays. AlphaGo put stones in places where nobody would have ever thought to put them, but turned out to be great choices. Players of customizable card games often discover new ways to use old cards that nobody else thought of previously, but end up being very strong combinations. For a game to be deep it must have enough complexity to contain opportunities for this sort of player discovery.
I really enjoyed this. It’s super thought-provoking already and I’m excited to see where it goes.
My question is on the measurability of d if the skill chain has some sort of infinite or fractal nature to it, or even if the map of all possible strategies is beyond some threshold of complexity.
As you say – heuristics are ways to chunk the game tree, in a way. Heuristics can also be discovered or evolved, by players or AI or whatever.
You say that strategies include “what rules of thumb to apply when making decisions,” among other things. So even if the very top of the skill chain doesn’t use heuristics because it can fully grasp the whole tree, a lot of the strategies in the middle of the skill chain do, I’d think.
My intuition is that there is some fractal or just infinitely granular nature to the skill chain for some types of games. This is partly because as players (human or AI) spend more time with the game, more heuristics will be developed, discovered, evolved, etc. As a result, more strategies will exist. Not all of these heuristics and new strategies will be used on higher levels of the skill chain, but they’d at least exist on it somewhere probably. It’s not obvious to me that after some point in a certain type of complex game’s, even after it is “solved,” that all possible intermediate strategies could be exhausted. Admittedly, this line of thinking may be straying towards games without perfect information, which may be slightly less relevant currently?
For certain other kinds of games, though, I can imagine all strategies being theoretically exhausted (I guess Chess is an example of this), but then I have trouble imagining how they’d be meaningfully broken down into a number of discrete ranks on the skill chain in any way where some metric d could be pulled from it.
If d is the “capacity for a game system to allow” for “more discrete ranks” of strategies on the skill chain, then I’m left wondering two things:
1.) How do you define a discrete rank, as well as how many discrete ranks a skill chain has, in practice? I know you say: “The size of a game’s skill chain is the number of distinct steps in the ranking of all players, where players at each step beat all the players at lower steps some significant percentage of the time,” but I’d be curious to see what this looks like with an example. How do the ranks and percentage measurements actually get determined? Is that method only relevant for games where you can practically and readily map all possible strategies and heuristics? For games where that is not practical, could d still be measured somehow?
2.) As I mentioned above, my intuition is telling me that there are some types of games, with some property where the skill chain of the game is infinitely granular or fractal. Maybe this property isn’t actually related to a d-threshold, or maybe it’s related to a hybrid of some other factors + a d-threshold? Are these then a type of games where d can not actually be measured? If so, can the Depth Project’s framework still say meaningful things about these games? Or alternately, do you think my intuition, having admittedly not spent very much time thinking about this, is wrong and that these types of games don’t exist? :)
Again – really interesting stuff. Can’t wait to see what’s next from TDP.
Hey Scott –
>> For a decision in a game to be meaningful, it has to have the following aspects.
>> First, it must have an effect on the outcome of the game.
At the risk of seeming petty, let me suggest a possible exception: a decision that has no effect on the outcome of the current game but allows you to increase your understanding of the game in a way that could have an impact on future games. I literally just saw this happen in a playthrough of 868-Hack (https://youtu.be/6wgkG3xsKHM?t=34m51s). Likewise, in your Carcassonne example, if the players are learning something useful about the game system in the course of playing through an endgame with an inevitable outcome, then I’m reluctant to say their decisions are meaningless.
Games are full of situations like this, especially single-player games, in which players must (and frequently do) often play *as if* there was a clear structure rendering their decisions consequential, even when that structure is only suggested or is entirely absent. Players are pretty good at playing as if they were operating under a general edict to get good at the game, to understand it as deeply as possible, to continually refine their strategy to move it closer and closer to perfect play. Most play occurs within the framework of this imaginary edict, and I think we should acknowledge these kinds of decisions as meaningful in a way that distinguishes them from iron vs hat in Monopoly or red vs blue in red/blue Chess.
>> The second thing that is required for a decision to be meaningful is that it must be non-arbitrary.
>> Picking randomly (in R/P/S is in fact better than using any other decision method.
There are many situations, eg. in a R/P/S tournament against unknown opponents some of whom may play non-randomly, where this just isn’t true.
>> The third requirement for a decision to be meaningful is that the effect on the end of the game must be significant. The more significant the consequences of choosing wrong, or benefits of choosing right, the more meaningful the decision is. A decision that could potentially result in a ten point swing is much more meaningful than a decision that only causes a single point difference in score.
If the goal of the game is to outscore your opponent, then the amount of points you score is less significant than the increase in % chance to win. This is one of the lessons we are learning from AlphaGo (via its infamous “slack” moves.)
>> Lastly for a decision to be meaningful, it has to have many non-arbitrary alternatives…
Yeah, this is a key point I think. There is a recursive, meta, level-jumping aspect here. What we are really talking about is not the algorithm for playing, say, Chess, what we are talking about is the algorithm we use to develop the algorithm for playing Chess, it is our strategies for searching through strategy space.
>> However, as much as you may want to, we can not ignore human psychology in this aspect. When there are more available valid options than a person can arrange in their mind at once, the decision becomes arbitrary and loses all meaning due to the near-impossibility of making the decision in a meaningful fashion.
But humans aren’t the only example of problem-solving under conditions of computational constraint! Arguably *all* thinking exists in the same context of imperfection striving to improve itself.
>> For a game to be deep it must have enough complexity to contain opportunities for this sort of player discovery.
I agree, this does seem intuitively true, and is worth considering as a criteria for the quality we are seeking.
Hi Rob, great questions.
First, in our approach, you can theoretically have a rich, multi-step skill chain in a game that is already fully solved. The lower steps in this skill chain are all strategies that are smaller/faster/less-complex than the ones above them. Admittedly, this is a bit hand-wavey at this point, but we are talking about some combination of k-complexity of the strategy itself + time it takes to run + memory required. This process of evolving more and more complex, effortful, and powerful strategies (roughly) reflects the learning process that human players experience in their journey into deep games.
This becomes an issue when there is a strategy that produces perfect play which is smaller/faster/less-complex than lower steps on the chain, all these lower steps would be wiped out and the d would be shown to be smaller, in fact, than previously thought.
In terms of avoiding fractal recursion, non-transitive strategies, and other pitfalls of practical application, the main idea is that there is no pure, abstract version of the d of any game, it never exists outside the context of a specific, well-defined set of conditions, including most importantly a language within which to express strategies. Therefore you can only measure the d of a game for such-and-such language and conditions, but this still gives you a way to compare games to each other and make hopefully useful observations.
As to whether such a measure might be discovered for an actual, real-world, deep game, the short answer is it probably can’t? We are in the early stages of trying to do it for smaller games, notably Blackjack, and we think it will be interesting to, for example, do it for tiny versions of Go (2×2, 3×3, 4×4, etc) and then try to extrapolate from that to implications for real-world Go (and beyond.)
The article is quite interesting, but I think it lacks a bit of rigor, even considering it is informal.
1) In the first graph, for example, what do you identify with “solution strength” ?
The “correct” parameter would be the rate of error per decision made, where an error is identified as a decision which does not conserve the game theoretic value (win or draw) of a position.
But to evaluate this parameter you need to know the perfect play in any position, which can only be done for trivial games.
If you use a proxy for solution strenght (like ELO), the value is influenced by factors like length of the game (i.e. number of “non-moot decisions) and cycles (A beats B who beats C who beats A).
2) Also in the first graph, how do you define a “strategy” ?
In game theoretical sense, a strategy is a plan of action for every possible position. This can be modeled as a look-up table (position–>move) or as an algorithm which will always return a legal move when the input is a position.
But the “sTrategy Space” (TS) that you need to observe is incredibly, large.
You can estimate it based on the “State space”(SS) and the “average branching factor (bf).
TS ~ bf^SS
This is much bigger than even the game tree complexity (GT) , which is estimated on the average game length (gl)
GT ~ bf^gl
to give an idea, I took 3 games
TicTacToe –> SS=10^3, GT=10^5, TS=10^620
Othello –> SS=10^28, GT=10^58, TS=10^(10^28)
Go –> SS=10^170, GT=10^360, TS=10^(2*10^170) !!
3) It looks you mean something different with “strategy”.Each circle (which would be a strategy in game theory) is connected by a line; if interpret correctly, then each strategy is this line and it would give different results with different computational resource.
So a strategy in this definition is an algorithm (possibly time bound ? ) which would output differently based on the resource given.
But the issue with this is that it’s implementation dependent; in this case the possible strategies are indeed infinite, so you can’t really find a granularity or a “chain”.
Moreover, you seems to assume that each “strategy-algorithm” asymptotically converges to perfect play; while this is true for alpha-beta or monte-carlo UCT, it is not universal for “any algorithm” (think of “Null Moves” or “Pruning”).
Lastly, it is also possible for two strategies to scale differently with resources, so to “cross” in your graph.
4) I like the analysis of the “high ordered” games with “one weird trick” or “superpowerful heurisic”. Typical examples would be NIM or
BRIDG-IT. But the issue is that the “weird trick” may be there without anybody being aware; how do we know that there is not a “trivial” solution to Chess or Go ? It is very similar to solving P vs NP if you ask me.
Happy Bloomsday. May your observations be conducted from a lively Martello Tower surrounded by a deep and calm sea, and may your journey to town brim with discovery.
Although you dismiss branching factor and depth, because they can be easily manipulated in many games, I think there is a different question that can be asked that uses these that has more meaning.
In particular: What knowledge can the player have/gain that reduces the effective branching factor and the depth?
So, in your trivial two-color coin example, the player gains the knowledge that the coin is irrelevant and thus reduces the size of the state space.
This causes us to focus on the question: For a given game, how many such constraints can be found, and how do they impact the size of the game. I’d argue that the best games have many layers of constraints that can be found, but they never reduce the game completely. If I know more constraints than you do, I can reduce what I need to analyze in the game and have an advantage over you.
The question which I can’t (yet) answer is how a machine can find/learn and express these constraints in a compact and usable way.
Hey, great piece, Frank! My favorite article I’ve seen of yours. I would love to think that some conversations we’ve had recently have slightly inspired this project. Either way, it’s something I’m very interested in, too.
A quick thing: DanC brought up that whole “candyland” concept – how a child finds candyland to be deep but an adult does not. This is supposed to be evidence that a game’s depth is subjective. I actually think the better way to look at it is that games have an objective amount of depth and different humans, depending on their cognitive ability/skill, are able to reveal more and more of whatever depth that’s there. Candyland is just a more shallow game than League of Legends, so a child (or an adult with adequately low cognitive ability) is unable to discover the limit of its depths. But this spectrum goes across all games, really. Smarter humans are attracted to games with more depth because they solve/semi-solve games with less depth. This does not suggest that the amount of depth a system has changes based on a person’s perception.
FTA:
>The more heuristics a game can support the more you can combine them into better and better strategies and the more you can improve with effort and study. But wait – it’s not quite that simple. Imagine a game in which there was a heuristic that allowed you to play perfectly.
Why did we jump from the “many heuristics” vision to “one single heuristic”? What I would have liked to have seen is an explanation of this specific point: Why is a game with a shitload of heuristics and no hard calculation considered to have less depth to a game that has both heuristics and calculation?
(My argument on this, for the record, is that it’s impossible to make a game that is “only heuristics” yet isn’t extremely random and inefficient.)
Anyway, great article! Sorry I only found it now.
Oh also, Frank – it seems to me that what you’ve written here almost directly backs up what I wrote about in my Information Horizon article (http://keithburgun.net/uncapped-look-ahead-and-the-information-horizon/). I’d love your thoughts on contrasting that article with this one.
Very interesting, Frank!
Quick question for clarification. The article states, ‘You can have a game whose solution is arbitrarily difficult to compute without guaranteeing that the game will have the kind of features we are looking for – the capacity to reward life-long learning and support a large, long-term community of serious, dedicated players.’ Are you implying that this feature is ‘depth,’ the topic of the study? It seems you are saying that this feature is desirable and I would agree, but doesn’t the desirability of the feature make it normative and place value on depth?
The criteria Scott provided for meaningful choices in a game and the revisions you added are interesting. Maybe I’m missing something, but it seems that Scott’s definitions were more independent of player psychology than the revisions you provided. For example, the reference to R/P/S choices becoming meaningful when an unknown opponent plays non-randomly incorporates player psychology and understanding the opponent’s psychology. Maybe I struggle with the idea that players and their strategies and choices cannot be separated from human psychology. Similarly, while games are mathematical, the decisions the designer’s make cannot be separated from their psychology.
What makes this discussion particular interesting to me is that the desired state of life, long learning toward mastery (and depth), meaningful choices made while learning for future play, and that in game design that motivates these kinds of player decisions are some of the opportunities in games for education.
Thanks for the clarification.
I’m not sure I understand, but I very much do want to understand. Specifically how it applies to standard JRPGs, but a basic understanding sounds useful. (I also understand most JRPGs aren’t particularly deep, mechanically speaking)
>> In both cases the description of how to find the best move is short.
>> But in deep games the best description for finding the best move is long and complicated.
1) It’s subjective, but: How long would the description be for the best move in a deep game? Is the scale for the whole game or an encounter?
2) Search, Heuristic, and semi-ordered. Can I get some real world examples, perhaps jrpg related? I think “Always heal when below 30% health” is a heuristic and “Always heal when below 30% health, unless you’re fighting Biggs” is technically a deeper game. I’m not sure what a useful heuristic size is in order to allow for a semi-ordered game space.
Maurizio, thanks for the detailed feedback. I will try to answer your questions as best as I can myself and will also make sure the rest of the team sees them.
>> 1) In the first graph, for example, what do you identify with “solution strength” ? […] If you use a proxy for solution strenght (like ELO), the value is influenced by factors like length of the game (i.e. number of “non-moot decisions) and cycles (A beats B who beats C who beats A).
Yes, by “solution strength” we basically mean rating based on performance in some kind of very thorough tournament. It is true that this leads to potential problems like non-transitive strategies (A>B>C) but we feel this is still the best criteria for evaluating partial solutions in the context of games.
>> 2) Also in the first graph, how do you define a “strategy” ?
You start by choosing a well-defined language for expressing strategies. Every strategy is a complete instruction for how to play the game expressed in this language. In this method you can never look at a complete population of all possible strategies for the game, you can only ever compare the strategies that are expressible in this language. So you would never be able to say anything about the “true d” of a game, you can only talk about the d within language L.
>> 3) It looks you mean something different with “strategy” […] So a strategy in this definition is an algorithm (possibly time bound ? ) which would output differently based on the resource given.
In this graph strategies are dots, not lines. The x-axis is basically algorithmic complexity. We’re thinking of it as strategies that require more and more computational resources to execute, rather than strategies that give different outputs based on resources given. This is admittedly the least-worked-out part of the whole approach and most in need of additional thinking and clarity.
>> how do we know that there is not a “trivial” solution to Chess or Go ?
In our approach, the discovery a trivial solution to a game would, in fact, cause us to re-evaluate the d of the game and show it to be lower than we previously thought.
Keith,
>> Why did we jump from the “many heuristics” vision to “one single heuristic”? What I would have liked to have seen is an explanation of this specific point: Why is a game with a shitload of heuristics and no hard calculation considered to have less depth to a game that has both heuristics and calculation?
This is a really good question.
A game with a ton of heuristics would, in fact, have a high degree of d, because the algorithm for describing the best strategy would be very complex.
In a sense, it’s all search. Heuristics are ways of compressing and speeding up search. Every heuristic is a shortcut that reduces the amount of raw search you do. Intuitively, in a game with a high degree of depth, you would expect strategies in the middle ranges to have a balance of heuristics and raw search, because the places they do raw search are the places where they can be improved by additional heuristics.
Truly fascinating. Questions:
-what are the heuristics for creating semi-ordered game state spaces? (or ordered/disordered)
-you talked about the number of steps on the strategy ladder indicating depth, what about the importance of the “top heuristic” (same as “perfect play”?), skill ceiling (computational resource?), and, something I haven’t seen you mention anywhere, the “clustering” of the top heuristics.
In other words, how tight a cluster is formed by the top heuristic and the ones right after it? (say, the top 10 heuristics)
We could imagine a game where there are many steps, where the top heurisitic requires lots of CR, BUT, where there is a large gap between that Heuristic and the second best Heuristic.
This would almost feel like a dominant strategy, or a “pure heuristic” game as you mention in your “Depth in Strategic Games” paper, but not quite. I think there is something there… What do you think?