Testing Stoneforge Mystic in Modern: Part Two

Here it is. The actual data from my investigation into Stoneforge Mystic. After well over 600 matches with my Abzan test decks, I can finally give a decidedly data-driven answer to whether or not Stoneforge deserves its place on the banlist and how it would impact Modern.

stoneforge-2-banner-cropped

I tried to be as clinical and scientific as possible in my treatment of the material and how I approached my results. I initially intended to include an actual statistical study, complete with confidence intervals and regression analysis, but my proved too small for a reasonable margin of error. That requires at least twice as much data, realistically triple, and I don’t have the time for that. If anyone wants to do their own data collection and add it to mine, you are welcome to try.

Hypothesis

If the goal is to be as scientific as possible, then we need a hypothesis to actually test. You cannot just do experiments, you need to be trying to prove or disprove something. For this test I want to avoid something too broad like, “The viability of unbanning Stoneforge Mystic.” After looking through postings on the subject throughout the greater Modern community, there is a consensus about why to unban Mystic. To the best of my ability to decipher, it is:

Stoneforge Mystic would allow more fair midrange and control decks to exist in Modern by slowing down aggressive decks and reducing their metagame presence.

Therefore, I will be performing my investigation with this hypothesis in mind. If Stoneforge slows the format down and makes fair decks a more attractive option then it is a good candidate for unbanning. If it does not accomplish that goal it should stay banned.

Describing Matches

I’m going to provide general impressions of the matchups rather than detailed descriptions. Trying to be specific about 600+ matches is a Sisyphean task to write and incredibly boring to read. In addition, most of the games can simply be described as, “If Abzan Liliana of the Veilruns the opponent out of resources it wins, if not it loses.” If you want more detail about a specific matchup ask in the comments.

I took copious notes and statistics as I went through and any particularly interesting and relevant ones are included in addition to the research I was actually conducting. Hopefully they will serve to provide additional color to my results and insight into my conclusions.

I looked up the consensus wisdom about whether Abzan was favored or unfavored before each test, partially to guide how I approached the matchup (I wasn’t an Abzan player before this test) and to provide some guidance for the validity of the testing. I will include the consensus evaluation alongside my own findings.

The Stoneforge Data

I will begin by discussing Stoneforge Mystic herself. For the most part, the way I played Mystic did not change between matchups so it makes sense to deal with the associated data separately. Unless otherwise noted, I played Mystic at the earliest opportunity possible opportunity and fetched Batterskull.

  • Opening Hands Containing Mystic: 54%
  • Games including Mystic: 72%
  • Average Mystic Turn: 3.80
  • % Total Games Mystic Played on Turn 2: 49%

This is remarkably close to the results that Sheridan reported. The decreased average Mystic turn is most strongly the result of play adjustments made for the Jeskai matchup. The increased turn two percentage comes from the additional games played and mulliganing decisions, held down again by the Jeskai adjustment, which will be explained in the appropriate section.

Data

Without further ado, in the order that I tested them, here are the results of my investigation by deck with sideboarding information. The specific decklists are in my article from last week. I didn’t record sideboarding strategies from my opponents unless they changed them because of Mystic. Also, I never made matchup-specific considerations for my game one mulligans to try and keep things as “real world” as possible.

Infect

Stock List

The community believes that Abzan is favored thanks specifically to Lingering Souls, but it needed disruption to be safe. I tended to mulligan for Souls and/or Inquisition in all my matches as a result.

  • Game One Win %: 48% (24/50)
  • Match Win %: 64% (32/50)

Abzan Sideboarding:

-2 Painful Truths
-2 Tasigur, the Golden Fang

+1 Engineered Explosives
+2 Fulminator Mage
+1 Curse of Death’s Hold

Games were not expected to go long and grinding wasn’t a factor. Either Abzan died quickly or Infect was successfully exhausted in the opening few turns. I lost a number of game ones due to hands with unplayable Tasigurs, or tapping out for Truths, so they get cut rather than interactive cards.

Losing the dead cards for more answers to Inkmoth Nexus and Blighted Agent drastically improved the matchup, proving the consensus correct. The few games Curse hit play my opponent scooped.

Mystic Abzan

I mulliganed a little less because of keeping hands with Stoneforge Mystic and disruption, but that didn’t impact the matchup very much. It was decent against Glistener Elf, but less effective than Souls had been.

  • Game One Win %: 50% (25/50)
  • Total Match Win %: 62% (31/50)

Abzan Sideboarding:

-1 Sword of Feast and Famine
-1 Tasigur, the Golden Fang
-1 Scavenging Ooze

+1 Sword of Fire and Ice
+1 Engineered Explosives
+1 Slaughter Pact

I was decided that the additional creature kill was more important here than discard, so I swapped swords. Explosives and Slaughter Pact were the best removal in the sideboard and I couldn’t cut too many threats or any interaction to bring in more. In the end I didn’t end up missing the Fulminator Mages and it didn’t affect the matchup. I did miss the “I win” aspect of Curse several times. Maelstrom Pulse didn’t have much impact game one so I didn’t think it necessary games two and three.

Burn

Stock List

Burn used to be considered a very good matchup for Abzan, but that was when it ran Kitchen Finks maindeck. Opinions on whether Siege Rhino actually replaces Finks appears mixed, and consensus has weakened on the matchup. I’ll take the assumption that it has moved to being more even these days.

  • Game One Win %: 40% (20/50)
  • Total Match Win %: 52% (26/50)

Abzan Sideboarding:

-2 Painful Truths

+2 Timely Reinforcements

Truths cost too much life and Timely was just as many cards if you follow the Philosophy of Fire. Nothing else was expected to have an impact. Thoughtseize was not cut because it gained an average of one life per use. There was also nothing else I wanted to bring in.

Burn Sideboarding:

-3 Searing Blaze

+3 Path to Exile

On the Draw:

-1 Monastery Swiftspear
-1 Goblin Guide

+1 Skullcrack
+1 Deflecting Palm

Abzan didn’t give Burn that many targets for Blaze so they were cut for the Siege Rhino answers. On the draw creatures were shaved on the assumption that they would be neutralized earlier and that Burn would have to play a longer game.

The extra lifegain was very important to pulling up Abzan in games two and three, but most wins still came from early disruption followed by Tarmogoyf. Siege Rhino was impactful, but also easy to anticipate and counter with a sandbagged Atarka’s Command or Skullcrack, frequently resulting in Burn victories.

Mystic Abzan

I mulliganed fairly aggressively for Mystic in games two and three, assuming that Batterskull would be very important.

  • Game One Win %: 66% (33/50)
  • Total Match Win %: 60% (30/50)

Batterskull was quite important, and in game ones Burn only won once when it was played turn three thanks to a lot of Skullcracks.

Abzan Sideboarding:

-1 Sword of Feast and Famine
-2 Thoughtseize
-1 Maelstrom Pulse

+1 Sword of Fire and Ice
+2 Duress
+1 Scavenging Ooze

The Sword was changed only because protection from red was relevant. Pulse was unnecessary, and with fewer lifegain spells and Duress available, Thoughtseize was less necessary.

Burn Sideboarding:

-4 Eidolon of the Great Revel
-4 Lava Spike

+4 Destructive Revelry
+1 Searing Blaze
+3 Path to Exile

My opponent wasn’t sure what to take out, but knew what to bring in. Answering Stoneforge and Batterskull was the priority and it happened enough to improve the win percentage. I was frequently forced to slow-roll Mystic thanks to hands containing multiple answers, which meant I often had no pressure.

Ad Nauseam

I know that the Worlds coverage team claimed that combo decks are good against GBx decks, but in my experience the matchup is slightly tilted in GBx’s favor, since disruption plus a clock is good against combo and that’s all GBx does. This is especially true of combo decks that require multiple cards to win.

Stock Abzan

Consensus was mixed, with the stronger one saying Ad Nauseam is favored. I suspect the dissent comes from the recent addition of Collective Brutality.

  • Game One Win %: 42% (21/50)
  • Total Matches Won: 46% (23/50)
  • Ideal Abzan Opening Hands: 24% (12/50)
  • Ideal Abzan Opening Win %: 91.7% (11/12)
  • Times Ad Nauseam killed itself: 10% (5/50)
  • Times Ad Nauseam would have died anyway: 60% (3/5)

A few things to explain: by “ideal Abzan opening” I mean Thoughtseize, Tarmogoyf, Liliana openings. Ad Nauseam killing itself counted the times it died to Spoils of the Vault, either from exiling all its win conditions or actually killing itself from life loss. My opponent was usually forced to play it to prevent dying next turn and more often than not it didn’t work.

Abzan Sideboarding:

-4 Path to Exile
-3 Scavenging Ooze

+3 Stony Silence
+2 Fulminator Mage
+2 Surgical Extraction

Take out dead cards, add in less dead cards. Extraction comes in because taking all of a combo piece, especially Ad Nauseam itself was frequently game over. This was balanced by how powerful Leyline of Sanctity was against Abzan.

Mystic Abzan

It was correct to find Sword of Feast and Famine in this matchup. While the discard was too slow to stop a fast combo, it helped tighten the screws and ensure victory.

  • Game One Win %: 40% (20/50)
  • Total Matches Won: 48% (24/50)
  • Times Ad Nauseam killed itself: 16% (8/50)
  • Times Ad Nauseam would have died anyway: 87.5% (7/8)

I really missed the additional Thoughtseize in the other Abzan list. Sword never won a game where I was behind, but it put me over the top of a few.

Abzan Sideboarding:

-3 Path to Exile
-1 Tasigur, the Golden Fang
-1 Scavenging Ooze
-1 Lingering Souls

+2 Stony Silence
+1 Maelstrom Pulse
+2 Duress
+1 Liliana of the Veil

All the extra discard had a significant impact, as did an additional answer to Leyline.

Merfolk

Full disclosure: by this point I was well over 300 matches in with the two Abzan decks and the practice was having an effect on the results. My play and subsequent win rates improved by an unquantifiable amount as I moved through the decks.

Stock Abzan

Based on my experience I expected this to go Merfolk’s way. Abzan has less removal than Jund, and Merfolk can power through Lingering Souls more easily than other aggro decks. Abzan wins when it can race with large creatures.

  • Game One Win %: 36% (18/50)
  • Total Matches Won: 44% (22/50)

In game one Merfolk’s speed and mana disruption were decisive and Abzan struggled to find its feet. It required a critical mass of answers plus a good clock to power through Merfolk’s redundancy, and that didn’t come together that often.

Abzan Sideboarding:

-3 Thoughtseize
-1 Lingering Souls
-2 Tasigur, the Golden Fang

+1 Engineered Explosives
+2 Timely Reinforcements
+1 Curse of Death’s Hold
+2 Damnation

Take out the less impactful or slow cards and find sweepers and board-cloggers. Truths stayed because you really needed to find a sweeper or two to win.

It didn’t change between decks, but I did record Merfolk’s sideboarding.

Merfolk Sideboarding:

4 Aether Vial
-2 Spell Pierce

+3 Tectonic Edge
+3 Relic of Progenitus

Merfolk wants more mana distruption and cantrips rather than tempo cards. Relic also answers Ooze, Tarmogoyf, and sometimes Souls.

Mystic Abzan

The plan was to use Batterskull to race as much as possible. Having slightly less disruption made this far more important.

  • Game One Win %: 44% (22/50)
  • Total Matches Won: 46% (23/50)

Access to Batterskull did make racing easier and made Lingering Souls much more impressive.

Abzan Sideboarding:

-2 Thoughtseize
-1 Tasigur, the Golden Fang
-1 Sword of Feast and Famine

+1 Scavenging Ooze
+1 Engineered Explosives
+1 Slaughter Pact
+1 Maelstrom Pulse

Having fewer sweepers was crippling for the deck and the win percentage suffered as a result. Swords were swapped mostly for protection colors as Fire and Ice killed very few creatures when Abzan wasn’t going to win anyway.

Death’s Shadow

Stock Abzan

Consensus apparently hasn’t been reached about Abzan vs. Death’s Shadow, except that Abzan really needs to watch out for an instant kill. On the one hand Abzan is good against Zoo but it’s not so good against trampling double-strikers.

  • Game One Win %: 46% (23/50)
  • Total Matches Won: 52% (26/50)
  • Game One Death’s Shadow is Zoo: 62% (31/50)
  • Game One Win %, Death’s Shadow is Zoo: 65.2% (15/23)
  • Adjusted Win %, Death’s Shadow is Zoo: 48% (15/31)
  • Game One Death’s Shadow Combos: 38% (19/50)
  • Game One Win %, Death’s Shadow Combos: 34.8% (8/23)
  • Adjusted Win %, Death’s Shadow Combos: 42% (8/19)

By Zoo vs. Combo I mean the games where Death’s Shadow played creatures and attacked over a number of turns and ground down Abzan’s life total like a traditional aggro deck, vs. wins by combining Temur Battle Rage and Become Immense. Pulling off the combo was harder than expected. DS was able to find the pieces easily enough—it was keeping Temur Battle Rage and Become Immense in hand long enough that proved challenging. The strategy for Abzan was to not lose and let DS nearly kill itself, then win with Spirit tokens.

Abzan Sideboarding:

-2 Painful Truths
-3 Scavenging Ooze

+2 Timely Reinforcements
+1 Engineered Explosives
+2 Damnation

In comes the anti-creature cards, out go clunkers. Gaining extra life and sweeping the board was very helpful. Ooze never got to grow due to mana constraints and Truths was unnecessary. I wanted to leave in as many naturally large creatures as possible.

Death’s Shadow Sideboarding:

On the Draw:

-3 Steppe Lynx

+3 Hooting Mandrills

On the play my opponent thought that the maindeck was fine, but on the draw they thought my discard would assist playing Mandrills early, which was correct for the most part.

Mystic Abzan

Batterskull used to be very good at beating fair creature decks, so I assumed that reliable access would improve things for Abzan.

  • Game One Win %: 44% (22/50)
  • Total Matches Won: 52% (26/50)
  • Game One Death’s Shadow is Zoo: 58% (29/50)
  • Game One Win %, Death’s Shadow is Zoo: 81.81% (18/22)
  • Adjusted Win %, Death’s Shadow is Zoo: 62% (18/29)
  • Game One Death’s Shadow Combos: 42% (21/50)
  • Game One Win %, Death’s Shadow Combos: 18.18% (4/22)
  • Adjusted Win %, Death’s Shadow Combos: 19% (4/21)

I didn’t record whether the change in the number of Zoo vs. Combo games was because I disrupted the combo less or they found it more. Still, Abzan logged an improvement in games where DS was forced to play fair like traditional Zoo.

Abzan Sideboarding:

-1 Sword of Feast and Famine
-1 Scavenging Ooze
-1 Tasigur, the Golden Fang

+1 Sword of Fire and Ice
+1 Engineered Exposives
+1 Slaughter Pact

I swapped Swords mostly because discard wasn’t very relevant by the time it hit and I wanted to draw more answers.

Death’s Shadow Sideboarding:

-2 Mutagenic Growth

+1 Ancient Grudge
+1 Dismember

On the Draw:

-3 Steppe Lynx

+3 Hooting Mandrills

My opponent wanted some answers to Batterskull and cut the least impressive card to do so.

Jeskai

Last matchup!

Stock Abzan

Consensus says its a very even matchup and whoever wins the attrition fight wins the match. Nahiri gives Jeskai the potential for free wins, but in practice she never does unless Jeskai already won the attrition fight.

  • Game One Win %: 48% (24/50)
  • Total Matches Won: 50% (25/50)
  • Game One Jeskai suspends Ancestral Vision turn one: 46% (23/50)
  • Game One Jeskai suspends Ancestral Vision turn one and wins: 87% (20/23)

Winning attrition by drawing cards was quite good and when Jeskai drew more at no cost it had the advantage.

Abzan Sideboarding:

-3 Abrupt Decay

+1 Painful Truths
+2 Fulminator Mage

Extra cards and mana disruption are pretty effective against Jeskai.

Jeskai Sideboarding:

-3 Remand
-1 Vendilion Clique

+2 Spreading Seas
+1 Celestial Purge
+1 Wrath of God

Same plan applies for Jeskai. Remand is great only against flashbacked Souls so it was cut for more impactful cards.

Mystic Abzan

After the practice games we adjusted how I played Stoneforge Mystic. Initially I just played it as soon as possible, but that frequently allowed Jeskai to adjust how it sequenced its plays to answer the equipment more effectively, so I began playing it as the last threat once Jeskai was down on cards.

  • Game One Win %: 54% (27/50)
  • Total Matches Won: 52% (26/50)
  • Game One Jeskai suspends Ancestral Vision turn one: 52% (26/50)
  • Game One Jeskai suspends Ancestral Vision turn one and wins: 80.08% (21/26)

Jeskai got a few extra Ancestrals on the draw due to Abzan missing one Thoughtseize, in addition to normal variance. The increased wins came from Stoneforge being a threat by itself and then finding another threat.

Abzan Sideboarding:

-4 Abrupt Decay
-1 Path to Exile

+1 Liliana of the Veil
+2 Duress
+1 Scavenging Ooze
+1 Sword of Fire and Ice

Additional disruption and another Sword to search for to make Spirits into real threats.

Jeskai Sideboarding:

-3 Remand
-1 Timely Reinforcements
-1 Vendilion Clique

+2 Spreading Seas
+1 Celestial Purge
+1 Wrath of God
+1 Wear // Tear

An answer was required for all my equipment. My opponent tried Stony Silence and found it dead too often to use. The lack of card draw hurt more than expected.

Limitations

No data set is ever perfect, and as a result no analysis will ever be perfect. There are limitations and flaws in any study, and unfortunately my testing was no exception. What I didn’t realize when testing began was how the different sideboards would impact matchups. Mystic lacking stock’s sweepers had a noticeable effect on the creature matchups, as did the extra discard against combo and control for Mystic. As a result the deviation between total matches won was fairly small, due to cards missing from both sideboards having greater-than-expected impact on the overall win percentage.

To account for that I will be focusing my analysis on the game one win percentages. Maindeck composition between the test decks is very similar and also isolates the impact of Stoneforge Mystic rather than Stoneforge plus sideboard cards, so it is more useful analytically.

Analysis

With our limitations in mind, lets look at the important numbers together.

  • Stock vs. Infect Win %: 48% (24/50)
  • Mystic vs. Infect Win %: 50% (25/50)
  • Stock vs. Burn Win %: 40% (20/50)
  • Mystic vs. Burn Win %: 66% (33/50)
  • Stock vs. Ad Nauseam Win %: 42% (21/50)
  • Mystic vs. Ad Nauseam Win %: 40 % (20/50)
  • Stock vs. Merfolk Win %: 36% (18/50)
  • Mystic vs. Merfolk Win %: 44% (22/50)
  • Stock vs. Death’s Shadow Win %: 46% (23/50)
  • Mystic vs. Death’s Shadow Win %: 44% (22/50)
  • Stock vs. Jeskai Win %: 48% (24/50)
  • Mystic vs. Jeskai Win %: 54% (27/50)

That’s still pretty messy. Let’s simplify things by tracking the change between versions.

  • Infect Win % Change: 2%
  • Burn Win % Change: 26%
  • Ad Nauseam Win % Change: -2%
  • Merfolk Win % Change: 8%
  • Death’s Shadow Win % Change: -2%
  • Jeskai Win % Change: 6%

Clearly Stoneforge had an effect on some matchups more than others. We can discount the 2% changes, as those represent only a single game’s difference, easily ascribed to normal variance. Jeskai, at only three games difference is right on the cusp of being relevant. I will ascribe a weak impact there, with Merfolk having a moderate impact. However Burn has been severely and unequivocally impacted by Stoneforge Mystic. In fact if you group the aggressive decks together you get a total impact of +34% for Abzan.

That would point towards confirming our hypothesis that Mystic would slow the format down by preying on aggro decks. However, that is not the full story. We must also consider Sheridan’s results with Affinity, and those indicate a worrying trend.

If we group the decks by fairness and look at the match results again:

Fair

  • Burn: 26%
  • Merfolk: 8%
  • Jeskai: 6%

Less than Fair

  • Death’s Shadow: -2%
  • Infect: 2%
  • Affinity: 12%, based on Sheridan’s results vs. Frank Karsten’s expectations, reported by Sheridan as low-impact.

Unfair

  • Ad Nauseam: 2%

Stoneforge Mystic affected fair decks far more often and more strongly than it did less-than-fair decks. This stands to reason when you consider that Batterskull is just a beater against combo and a much slower one compared to Tarmogoyf. Meanwhile it actively works against the aggro strategy and dominates the mid- to late-game. Even against midrange decks it is a strong, hard-to-kill threat. This split is corroborated, though not confirmed, by investigating the impact on the fair Death’s Shadow games vs. the unfair games.

  • Stock vs. Zoo style win %: 48%
  • Stock vs. Combo style win %: 42%
  • Mystic vs. Zoo style win %: 62%
  • Mystic vs. Combo style win %: 19%

Which yields an end result of:

  • Total change vs. Zoo style win %: 14%
  • Total change vs. Combo style win %: -23%

I doubt that Mystic is the actual reason the combo win rate got so much worse, but the results are the results. Adding Mystic to Abzan dramatically increased its win rate when Death’s Shadow played fair and hurt its chances at beating unfair attacks.

Impact

So what does all this mean? If my results accurately model real Modern, then it is fair to say that Stoneforge Mystic would not have an absolutely warping effect on the metagame. It is a powerful card but not truly degenerate, and it ultimately advantages fair midrange decks against aggressive decks.

Blighted AgentThe problem comes when we consider what kind of aggressive decks will feel the blow. Fair decks will be impacted much more strongly than unfair decks. Infect definitely doesn’t care about Batterskull any more than it does Tarmogoyf, and Affinity can care but it has plenty of options to get around it and win anyway. When Death’s Shadow is playing fair it cares as much as any Zoo type deck, but when it assembles its combo kill then Batterskull doesn’t matter. I would therefore expect them to try to combo more often.

The only reason that Batterskull would change an unfair combo matchup is by gaining more life than the combo can erase, which is hard considering how slow a clock Skull is compared to a turn two Tarmogoyf.

As a result, I would expect that in the wake of Stoneforge Mystic being unbanned there would in fact be a decrease in the total number of aggressive decks in Modern as Merfolk, Zoo, and Burn take a hit. This would slow things down as more players try slower decks with Mystic. However, after the initial slowdown, the format would accelerate as players notice that unfair decks aren’t affected. This will push players to play more Infect, Affinity, and combo decks and the aggro players will try to incorporate more unfair elements to fight back against Batterskull.

Stoneforge MysticThere is also the effect on other midrange decks to consider. The Jeskai results suggest that those decks that play Mystic will have an advantage over those that don’t. I suspect that had the Mystic deck run Painful Truths, Abzan would have been more strongly favored. The fact that Mystic still pushed it over Jeskai suggests that it would drive the format towards greater homogeneity. If you have to play Stoneforge to win, that does limit your deckbuilding options.

Could we adapt? Possibly. The Burn matchup would have been much more in Abzan’s favor if not for Destructive Revelry. Adding more targeted artifact removal might keep it in check, but I suspect that if players start doing that then Mystic decks will similarly adjust and run extra equipment and protection for it. This also doesn’t consider whether or not decks can afford the space with unfair decks running around.

Conclusion

Based on the results of my testing Stoneforge Mystic in Junk Abzan I recommend against unbanning. My results partially prove the hypothesis true, but analysis of the impact suggests that over the long term it will have the opposite effect.

While its power is manageable and it would give players more reason to play white, its impact would not be positive. It negatively impacts the viability of fair aggro decks and non-Stoneforge midrange decks, while having a negligible impact on the less fair decks. The likely outcome would be a shift to more unfair decks and the speed of the format increasing to try to ignore and invalidate Batterskull and Swords. Therefore there is no reason to unban Stoneforge Mystic.

I’m sure that many of you have questions about my conclusions, methods, or more specifics about how matchups played out. As always I am happy to discuss them with you in the comments. Next week, tune in for something completely different.

David began playing Magic during Odyssey block, quit playing Magic when Caw Blade ruled the world, and returned to Modern shortly before Deathrite was banned. He’s made an appearance at the Pro Tour, made money at GP Denver, and is constantly grinding and brewing in Modern.

29 thoughts on “Testing Stoneforge Mystic in Modern: Part Two

  1. Pretty expansive data set there. I think your conclusion checks out well, and I’m glad you went through the higher-level analysis to note that decks such as Death’s Shadow and Infect would find themselves boosted by proxy (especially given that Burn’s decline would put a big dent in their major metagame predator). Do you have any further projects of this type planned, or was this one exhausting enough that you’ll put this sort of thing on the back burner for the time being?

    1. It was pretty exhausting, and I used up a lot of good will and favors to get all the testing done. Don’t expect anything like this from me again for some time, I doubt I’ll find enough experts willing to work on such a project for quite a while.

      I’m also not sure what I would test, most of what’s left on the list is there for a good reason or doesn’t need testing like Splinter Twin (No, Bloodbraid Elf is not a consideration, look at how good Jund is without it, don’t boost Jund). Suggestions?

      1. Hey David, really awesome experiment, truly.

        While I can’t say I’m happy with the results (I’d really like to see more White in Modern) it is good to know from a strategic and knowledge-seeking standpoint. It would have been good to see the Dredge match up, but I can understand that it wasn’t around at the time of testing. I would extrapolate it would play out similarly to Affinity.

        What I find interesting is that it does do a great job of fighting off fair aggro decks. As such, if a new weapon were introduced to fight off the “fair” decks, SFM might be a more reasonable unban. I’m not worried about homogenization so much. Most red decks run Bolt, most green decks run Goyf, etc. In fact, I think the reason White is largely considered the weakest color in the format is because it doesn’t have a defining card like the other colors do.

        In terms of suggestions, if you ever wanted to do something like this again, I would love to see a Preordain test. It’s the card I consider most reasonably unbanned in the format.

        Overall, really thorough and impressive job. You should be proud!

  2. Fascinating conclusion. Friends and I have discussed the safety of unbanning SFM, we considered unbanning multiple things such as JTMS and BBE before SFM; as those cards are much slower than SFM and require more setup and card selection in deck building. Twin being unbanned would help reign in the speed of the format, which needs to come first, before SFM can even be talked about it. However, I do think that including Jund with x2 Kolaghan’s Commands would affect testing greatly. All in all – great study, can’t wait for more.

    1. The problem with Jund is that at the time Kommand was seeing very little play, and I barely see it at all now, so it would have had no impact in the actual testing. I also couldn’t find a willing Jund pilot.

      Should SfM be unbanned that would change, but I don’t think that the existence of answers is reason to unban a card if its impact is high enough.

  3. Awesome job. You put a serious amount of work into this so thanks for that.
    Im not gonna lie reading through this article the conclusion was quite the twist. I did not expect you arrived at that from the data presented.
    I can see where you are coming from, but it seems that the control deck had a fine time answering Stoneforge and even though really unfair decks had a slighty better match up saying that meta will turn to really unfair decks might be an exaggeration. I would have liked to see how a kolaghans command decks grixis/jund would fair against Stoneforge because that will determine if the statement is true that you have to play Stoneforge to be a competative fair deck. Since those decks also have fine unfair match up

  4. Your hypothesis seems wrong to me. Something should be on the banlist if it violates the bannable criteria (turn four rule etc). Whether or not mystic slows down aggro is completely irrelevant. We dont ban cards for not slowing down aggro, so why would that be a rationale for unbanning?

    If anything your results make it look mostly harmless and not format warping. It deals a blow to merfolk and burn and doesnt do much vs combo decks and infect.

    You should have started with “why is this card banned?” because every white midrange deck would play it? Whoop dee doo every black midrange deck plays iok, every red midrange deck plays bolt, every green one plays goyf, etc etc.

    Whats changed since modern started? We have far better and maindeckable answers to artifacts. Decay k command revelry d sphere stony silence etc etc. Decks also do far more disgusting things on turn 3 now than play a 4/4 lifelink vigilance.

      1. Killing the germ makes them blow 8 mana to replay the batterskull. With an sfm on board its just four which is significant, but if you cant kill a 1/2 creature when it matters you probably arent winning a lot of matches anyways. Letting your opponent untap with a stoneforge on t3 is like letting them untap with a bob or young pyro – not something you ever want to happen.

  5. I disagree with this conclusion. Not enough decks were tested, and only two fair non-linear decks were tested. Missing decks include bant eldrazi, jund, junk variants, kikicord, grixis variants and edlritch evolution decks. These decks all interact with creatures, and win through the combat step. Without seeing how stoneforge affects these creature-centric midrange matchups, I can’t agree with the conclusion. I like the testing, I just would appreciate it if more deck types were included in the testing, to reach a more thorough conclusion.

    1. I mean, the thoroughness of testing means that getting more decks is a lot of pain. Additionally, it required good pilots for the decks it was matched against. I wonder if you could make a magic computer program that played as close to optimally as possible. The idea of being able to test hundreds of matches a minute seems interesting but somewhat scary (It would be very easy to find optimal builds.)
      As a Mardu player facing a lack of 2 drop my attention has shifted somewhat to a Bedlam Reveler focused build rather than a confidant one. Or maybe both if Im feeling a little lucky.

    2. Let’s try to be reasonable here. David played 600 matches to create this data set, over several months. The labor involved in adding 6 additional archetypes is hardly trivial. Of course he hasn’t made the definitive statement on Stoneforge Mystic in Modern, but these data contribute significantly to the debate.

    3. I mentioned this on another message board, but I’ll repeat it here. On the one hand, you’re not wrong that N could be larger and more decks could have been included. This is often the case with analysis like David’s. On the other hand, however, this is substantially more testing than Wizards will likely do for any unban. I don’t know whether they will consider results like these, but even if they don’t, the results could still be predictive of how Wizards will treat the issue with more limited testing. That is to say, even if N is smaller than a statistician or scientist would like, it’s still big enough to make unban suggestions from.

      1. Yeah more testing isnt really reasonable but the decks could have been chosen a bit better. If anyone expected batterskull to swing the ad nauseum or infect match im not sure what to tell them. Did not likely need a hundred games of those to confirm that lifelink doesnt matter vs poison and is almost always inconsequential vs ad naus burn spell finisher. Burn itself should be hit the hardest being by far the most vulnerable to a 44 lifelink vigilance on t3. Its nice to get a number tho (26 percent improvement for the sfm deck)

        The meta conclusion also bugs me – is it a failure if a bunch of aggro decks get displaced by sfm midrange decks? Isnt it more diverse having a bunch of interactive sfm decks fighting infect and deaths shadow than having them fight other linear aggro like burn and merfolk? Id also note sheridan found sfm low impact despite 12 percent improvement vs affinity – if so, we probably shouldnt exaggerate the 8 percent uptick vs merfolk.

    4. It is ALWAYS the case that more testing leads to stronger conclusions, just like it is always the case that testing is constrained by time, funds, and personnel availability. I think it was like the first paragraph where Ermenwein invited readers to continue this testing themselves… so go nuts.

  6. Disappointed this didnt test against a single k command deck. One of the huge arguments against sfm is that it would kill all non-sfm fair decks, and that k command isnt enough to keep the matchup similar. Showing that mystic did indeed help against aggro, whether or not they are ‘fair” or “unfair” aggro, shouts to me that she is a good unban. A huge advantage in the jund/grixis matchup would be enough for me to be against the unban, which considering the jeskai matchup nearly unaffected (keeping in mind this deck can run sfm themselves and didnt for this test), i find it hard to believe kommand wouldnt keep these matchups even.

  7. well that must have taken quite a bit of work, well done,your testing makes perfect sense and confirms what many suspected: unbaning powerful fair cards will probably lead to an increasingly linear meta in order to avoid having to deal with them, this effect would be magnified with something like JTMS which essentially adds burn,zooicide etc to the list of decks it’s not very effective against

    essentially SFM wants you to go over her with something like Tron or a game winning combo, while JTMS wants you to go either under (burn,infect etc) or over him (tron) but both give a clear message: to beat us play linear

  8. Again thanks alot for the great amount work that was put into this. I have a few questions and I can not help but feel the conclusion was maybe a little biased or an interpretation of the numbers which atleast is not what I see from the data presented here.

    So basically to recap. We confirm that stoneforge helps fair midrange and control decks against fast aggro decks which was the hypothesis. The data does not show that control without stoneforge would be crushed by the card and this doesn’t even take into account grixis/jund which have natural answers to it but the fact that the control deck can win even without special answers. Furthermore we confirm that stoneforge does not do too much against linear combo decks which we already had a hypothesis about that it wouldn’t, since batterskull is only a beatstick here. So to conclude that the meta will move towards an more unfair meta is at best a guess and is no way in correlation with numbers presented here. Since this doesn’t take into account the other deck that could emerge from the meta slowing down so it impossible to predict the meta based on the data here.

    All I can see is:
    Yes she helps midrange/fair decks against fast aggro decks and that would help slow the format down. And yes you can be competitive without her in your deck. I honestly fail to see how you can argue that isn’t good for the format and she should stay on the banlist.

    1. The hypothesis was for all aggro decks, not just the fair ones. The aggro decks that weren’t fair were not measurably impacted by Stoneforge Mystic. Therefore we cannot fully confirm the hypothesis, and on that basis cannot definitively say that Stoneforge would be safe to unban.

      Part of the hypothesis also claimed that there would be more room for fair midrange decks in Modern and I cannot confirm that due to the negative impact that the Stoneforge-less Jeskai deck saw. It’s not definitive evidence, but if Jeskai is harmed by not playing Stoneforge it stands to reason that other midrange decks without it would also be harmed.

      Furthermore, when we consider the fact that the faster unfair aggro decks were unaffected as was the combo deck, we can extrapolate that the correct decision in a Stoneforge filled metagame would be to play a fast unfair deck.

      Putting this together, what I expect to see is a decrease in fair agro decks and a rise in Stoneforge decks. This will lead to a rise in unfair decks who are unaffected, and this trend towards unfair decks will negatively impact the format. How long or sustained that would be is impossible to say, but if we see the format moving towards the unfair end of things I think that’s a negative impact.

      1. Thanks a lot for the reply.
        I think your reasoning is a little flawed since we could almost already tell that a slow 4/4 wouldnt do much against combo decks and “aggro combo” decks. When you put it like this it gets almost impossible to evalutate the card since this can be said of a lot of cards. But it did help in the matchs we hoped it would help.
        The jeskai deck were only slighty affected with 48 to 54 preboard vs 50 to 52 %. This suggest that the deck doesnt get eaten alive by the card and you can be competative without the card .
        So the real question that should be asked is if the card allows other midrange/control to flurrish. Because as far as aggro and combo it did IMO what it was supposed to.

        I would argue that the meta would shift to more fair meta in the long run, since grixis control would probably be the biggest winner with sfm unban. Since it faces less of its worst match ups in burn and to a less extent zoo. And can play with vision (80% win rate when suspended on 1 against mystic), play good cards to counter the sfm decks and still have a lot of tools against the really unfair decks in counters and good removal. I could see even jund coming out strong from this aswell with good removal, discard and goyf which laughs at a small 4/4. Hell even bant eldrazi comes of strong from this since its decent against some really unfair decks but can struggle against some fast decks. But mystic is quite lackluster against them on paper, with displacer for the token, thought knot to take the equipment, big creatures and I could go on.
        Those are the interrresting decks since they will determine how oppressive sfm is against midrange decks. If the card doesnt allow for them to exist and you have to play sfm to be competative then I agree it should stay on the banlist. But from my hypotheses this is not the case and the card should be unbanned.

        As I said earlier I dont mean to take anything away from your work since you did an awesome job but this is simple to start a discussion.

  9. I dislike one part of the methods.
    The mystic deks used to test against enemy decks should have been very high initially so you could find a mysticdeck that had the best chances against the testdecks to start with,
    then, when you found something solid, you could start the heavy testing.

    I do however like this new trend of “mega-meta testing” which I may have helped form with my own rogue deck products (I test each of my decks for years) that I post on forums.
    (While known as wickeddarkman)

  10. Hello David! Thank you for this great article and the effort you and the other testers have put into this to get these amounts of data! I am no native speaker, but i hope you put the effort in reading this wall of text and can estimate what i am trying to say.

    Looking over the data, i do come to a different conclusion. So if you don’t mind, i would like to add a bit of context:
    First of all, you have tested Stoneforge Mystic in Abzan as Sheridan Lardner did, reasoning its near Tier1 presence as a logical proven best-deck contender that can adapt Mystic without splashing a Color.

    However, to measure impact of the Card i do think we need to incorporate the Metagame.
    Abzan attacks unfair Decks on the same angle as Jund, which we know is not very useful at the time, as the Meta has become more linear although these Decks exist and are prevalent.

    With Blue Control slowly vanishing, answers that can regulate unfair decks on a different angle (counterspells) are dropping in numbers. As unfair Modern Decks are resilient enough to only be disrupted temporarily, decks like Esper and Jeskai have proven excellent at extending their timewindow to win, but fail to exploit the window they openend. Stoneforge Mystic could address these issues for blue based Decks.

    So testing with Abzan might not be the optimal choice, as Abzan absically gains a powerful addition to the already powerful Tarmogoyf, but this adds nothing to the deck that it hadn’t had before. Jeskai and Esper simply lack that option.

    Additionally it reads like you based your Conclusion on the pre-board Match data. But when i do look at the Jeskai Matchup including post-board data i don’t see a significant Impact:

    Stock Abzan:
    Game One Win %: 48% (24/50)
    Total Matches Won: 50% (25/50)

    Mystic Abzan:
    Game One Win %: 54% (27/50)
    Total Matches Won: 52% (26/50)

    While the preboard-Matchup seems to have improved significantly, the post-board matchup must have been significantly worse on the same rate. In addition since the average round for Midrange Matchups is higher, it is easier to fix with the Sideboard since you can get less sideboard cards more often as the games are longer.

    As this is the only Midrange/Control Matchup data available, so i can only partially agree to the statement
    “It negatively impacts the viability of fair aggro decks and non-Stoneforge midrange decks, while having a negligible impact on the less fair decks.”
    because the data does not support the claim that it would reduce the viability of non-Stoneforge Midrange decks. The Impact on less fair decks for the meta cannot be concluded, since adding stoneforge mystic to Abzan neither adds a new angle of attack or more pressure than before to these decks.

    ” The likely outcome would be a shift to more unfair decks and the speed of the format increasing to try to ignore and invalidate Batterskull and Swords.”
    The format already is linear, and linear decks only have two choices when a new threat is introduced into the format: Ignore it and go under it when the effect on the Win% is minor, or try to invalidate it when it cannot be ignored (e.g. Destructive Revelry versus Leyline of Sanctity in Burn). If Stoneforge would be introduced into the format, the meta also could temporarily shift away from linear strategies in order to explore the strength of stonforge mystic and then return to its previous state when the impact is too low, at worst switching the Midrange-To-Go Deck from Jund/Abzan to Jeskai/Esper/Bant.

    The same argument regarding Midrange decks could be made about Tarmogoyf:
    “It negatively impacts the viability of fair aggro decks and non-Tarmogoyf midrange decks”

    Again, I really appreciate all the work you’ve put into this, this is an excellent read and really a great contribution to further discussion!

  11. Another limitation in your testing is you tested stock decks *from a metagame where SFM doesn’t exist* against an SFM deck. As with any powerful card or strategy, the existence of SFM may cause small changes in deck construction which limit the effectiveness of the new threat (in this case, SFM). For example, Kolaghan’s command decks like Jund and Grixis have been around for a while, but the popularity of Kolaghan’s Command itself has been declining. Perhaps in an SFM-legal world, these strategies achieve success by increasing their counts of Kolaghan’s Command. These kind of adjustments aren’t a bad thing and can actually be healthy for a format. These shifts are near impossible to identify during this type of testing. Bear in mind, I have no data to say this would be what happens were SFM made legal, I’m merely using the example to demonstrate my point.

    Lastly, there is nothing intrinsically wrong with having a common win-condition in a color, or strategy. Saying that SFM would become the reason to play white or that decks would splash white for SFM is akin to saying decks play green for Tarmogoyf, red for Lightning Bolt, black for Inquisition of Kozilek, or blue for Snapcaster Mage. Having to pass the SFM-test isn’t significantly different than having to pass the Bolt-test. In fact, these common themes help aid sideboarding decisions (“this is how I deal with SFM strategies” is the same as “this is how I deal with big mana strategies”). Unifying the threat base for a subsection of decks can actually reduce the strain that Modern players have on their sideboard, because specific sideboard slots are more useful in a wider array of match-ups.

    This point obviously assumes that SFM doesn’t become the only strategy in Modern, and I believe your testing showed ample evidence that this would not be the case.

    1. I’m actually impressed about the whole work you’ve continued to reach such results. However, and since i kind of have some academic background in data analysis and investigation, some few aspects bother me…

      – You missed some key decks while testing

      Band Eldrazi is the main one, but Tron and Scapeshift (the BTL version), natural prey of midrange decks, should have been taken into account as well, if not tested, at least as a way to notice they might change your results

      – Other decks would pack Mistyc as well

      I see no reason for Jeskai decks to divide into two different versions: Nahiri and “Jeskai Blade” decks. That by itself, can shape the whole meta you’re trying to predict.

      – The lists aren’t fine-tuned to such a meta.

      Just like when you had Splinter Twin, some cards would see more play than others. Abrupt Decay numbers would rise, for instance, thus affrecting the results as well. Just a single example, for instance

      In short, it was a very useful work, but i think it’s very audacious to conclude that unbanning Mystic would result in a massive metagame shift.

      Valakut, Bitterblossom, Sword of the Meek directly contradict these conclusions, and they were all supposed to be broken and opressive decks.

      Don’t get me wrong, it was excelent work, what you did here. But i think your results shouldn’t give you such conclusions…

      1. While I appreciate the criticism on this point, as has been pointed out before both here and on reddit, what I have is the result of 600+ matches played over nearly three months. I know this isn’t as through as can be, that’s why I note my limitations and said that my conclusions were based on my results, not the definitive answer. I saw fair decks punished and unfair decks unaffected. Trying to do any more would add months on to this project, and at some point you just have to publish.

        On some specific points:
        Tron matchups would be unaffected because Tron comes online the same turn that Stoneforge could be activated and mostly doesn’t care about 4/4’s on three (see also Loxodon Smiter). Bant Eldrazi had not been proven as a metagame fixture when I began the project and I only wanted proven decks. I also only tested against experts on their deck and there was never a Scapeshift pilot willing to participate.

        The next two I’ll address together: That adds too many variables. To make this as applicable as possible to the actual metagame I needed proven decklists. Trying to add Mystic to lists or adapt existing lists to a Mystic filled metagame means that I am adding in the variable of my own deckbuilding skill, and that would necessarily make my results weaker. Especially if I make a mistake and don’t detect it early enough.

        1. Yeah, i understood why you didn’t include Bant Eldrazi in your analysis, hence the reason i wasn’t criticizing it as well. The Scapeshift deck is also well explained, because a subpar pilot would ruin all your data. I can agree with that.

          Againt, i actually lean towards the theory that Mystic wouldn’t warp Modern, but again, i do not have the data to support it. I just htought, as a fellow investigator that it was… bold to reach such conclusion. But i understand that this piece of research woul have to have a conclusion either way.

          And again, wonderful piece of reserarch and arcticle(s)

  12. Eldrazi displacer would enable someone to really abuse stoneforger mystic.
    Add it in a UW control shell and soon you will have plenty of rare swords with protection from two colors as well as batterskulls. If that wont work, something similare will.

Leave a Reply