본글: https://engineering.talkdesk.com/double-trouble-why-we-decided-against-mocking-498c915bbe1c

저자:

André Carvalho Feb 25, 2020 · 9 min read

그래서 목 객체로 한다면?

“So what if we mock?”

(Image retrieved from https://bit.ly/2P2jPHd)

잘 알려진 과학 조-크가 있다. 한 낙농업자가 우유 생산량이 저조했던 문제를 해결하기 위해 이론 물리학자에게 도움을 요쳥했다.

There’s a well-known science joke about a dairy farmer who asks a theoretical physicist to help him solve his farm’s low milk production problem.

물리학자는 답을 줬지만, 경고를 덧붙였다. “진공 속 구체 모양의 젖소에게만 해당되는 내용이예요.”

The physicist returns with an answer that has a caveat: “it only works for spherical cows in a vacuum.”

이는 물리학자나 프로그래머에게 필수 요소인 “추상화의 개념”을 시사한다.

This illustrates an essential part of both physicists’ and programmers’ jobs: the concept of abstraction.

세상의 모든 소프트웨어는 주어진 문제의 세부 사항을 추상화하면서 올바르고 지속 가능한 구현을 녹여낸다.

All of the software building and the good practices that come with it hinge on correctly and progressively, abstracting from the details of a given problem.

그러므로 테스트를 작성하기 위한 구현 세부 사항의 추상화는 아구가 맞다. 훌륭한 단위 테스트란 재사용성이 좋고 구현하기 쉬워야 하는데, 실제 현실의 온갖 디테일을 신경 쓸 필요가 있을까?

And so it stands to reason that we should also abstract from the implementation details on our tests. A good unit test should be repeatable and easy to implement, so why should we care about the gritty details that reality casts upon us?

이럴 때 더블 테스트를 쓴다. 말 그대로 젖소 클래스를 테스트하고자 하는데 가능하다면 실제 젖소가 아닌 대안을 찾아서 그냥 우유를 짜낼 수 있는 능력을 가진 구체를 시뮬레이션 한다고 보자. 훨씬 상황이 간단해진다. 젖소가 난폭하게 발차기할 가능성도, 동물 실험을 묵과하는 사람도 없다.

That’s what test doubles are for: if we were, say, testing a class that milks cows, we could, instead of using an actual cow, just use a sphere that simulates the ability to be milked. It would be much simpler, with way fewer chances of the cow kicking you violently, and so no one would condone testing on animals.

그렇지만 당연하게도 배포 환경에서 젖소가 우유 통을 차버리지 않을 보장은 없다. 그 때 돼서야 엎질러진 우유를 두고 울고불고 해도 소용없는 셈이다.

The flip side is, of course, you have no guarantee an actual cow won’t kick the milk bucket in production — and there’s no use crying over spilled milk.

어떤 맥락(낙)농업

Some cowntext

오해시켰다면 양해를 바란다. 이 글은 가축이나 구체를 다루는 건 아니고, 소프트웨어 테스팅에 관한 이야기다. 특히 단위 테스팅과 과대 평가된 테스트 더블에 관해 알아보자. 이야기는 Talkdesk 의 Atlas team 에서부터 시작되었다.

We are sorry if we mislead anyone, but this is not an article about cattle, spherical or otherwise, but about software testing. Particularly about unit testing and the over-reliance on test doubles, topics that arose within the Atlas team at Talkdesk.

아틀라스는 현재 개발중인 플랫폼이고, Micro-Frontend Approach 를 통해 좀 더 빠르고 유연하게 개발하고 있다. 이 방법으로 고객들에게 쉽게 확장 가능한 맞춤형 경험을 제공해준다.

Atlas is a platform we’re building to make our development quicker and more flexible by using a micro-frontend approach, as well as providing a more customizable and easily extensible experience to our end customer.

일반적으로 OO 다이어그램이 사용되고 있고, 코드베이스의 대부분은 React 프론트엔드 기반의 Plain Javascript (ES6을 생각한다면) 으로 쓰여져있다.

We generally use an OO paradigm and most of our codebase is written in plain (if you consider ES6 plain) JavaScript, atop a React frontend.

이 글에서 주된 근거들이 계속 유효하다고 강력하게 주장한다 하더라도, 다음 문장으로부터 시작해야 함을 잊지 말라. 자바스크립트는 동적 느슨한 타입을 쓰는 언어이고, 느슨하고 동적인 타입의 non-젖소는 관리하기 가장 어렵지만, 한번 시작해보자.

Even though we would strongly argue if the underlying rationale of the article remains valid, keep in mind our starting point: JavaScript is a dynamic loosely typed language. And dynamic, loosely typed non-cows are the hardest to manage. But let’s get to business.

SUT 업 : 몇몇 단위 테스트 개념

SUT up: some Unit Test concepts

그래서 단위(unit)이란 건 무엇인가? 보기보다 쉬운 질문은 아니지만, System Under Test (SUT) 에서 이야기하는 테스트할 수 있는 가장 작은 조각(smallest testable piece)이라고 정의한다면 조금 낫겠다.

So what is a unit? It’s not a simple answer as it may seem but, if it helps, you can define it as the smallest testable piece of software of the System Under Test (SUT).

OO 프로그래밍에서는 대개 테스트되는 클래스를 의미한다. 이 글의 취지로 보면 그 정도로 생각해도 충분하다.

In the context of OO programming, it often refers to the class under test, and for the purposes of this post, it’s enough to think of it as such.

테스트 더블은 테스트에서 실제 구현을 대신할 수 있는 코드로 구성된다. 영화 스턴트 대역(doubles) 배우가 대신 낙하하는 장면을 찍는 것처럼 말이다.

Test doubles consist of any code that stands in for the real implementation in a test. Just like movie stunt doubles, they take the fall for you.

Mock 이라는 단어는 종종 테스트 더블과 혼용되어 사용되곤 하는데, Mock의 용도는 사실 한 가지이다. Mock은 예상되는 범위 안에서 주어지는 행동을 제공한다. 하지만 테스트 더블은 대체제를 의미할 수도 있다. 함께 나오는 용어로 stubs 와 spies 등이 있다. stubs는 호출할 때의 값과 상관없이 고정된 결과가 나오는 함수이다. 우리의 영원한 Martin Fowler 님이 이야기하는, 이것과 Mock 의 차이점에서 내용을 더 확인할 수 있다. spies 는 함수를 호출할 때 정보를 추가적으로 제공해주는 래퍼 함수라고 간단히 정의하자.

Though the word Mock is often used interchangeably with test doubles, they’re only one of several flavors. Mocks provide a given behavior depending on a set of expectations, but test doubles can also refer to alternatives. Just to name a couple, such as stubs (functions that provide a canned answer regardless of the call value — you can read more on the difference between these and mocks by the inevitable Martin Fowler), or spies (let’s define these simply as function wrappers that provide information on function calls).

Many traits separate the wheat from the chaff in unit tests, and many articles pinpointing them, saying the same with different wording. We can sum it all up: unit tests need to be reliable, maintainable and readable. And while mocking and using test doubles may look helpful, they can also make it more difficult as your codebase, the number of unit tests and your tech debt grows.

Their initial convenience made us over-dependent on them. We were building Atlas from scratch and while testing new code we gladly leaned on mocks.

Only when we decided to do a big refactor we realized that we had made such extensive use of test doubles that they became unreliable. When you refactor you alter the code, but the mocks are impervious to that change and keep deprecated interfaces. Our tests lost relevancy because they didn’t depend on the actual code, but on a test double version of it — they worked for spherical cows in a vacuum.

That’s when we agreed to do something about it, and also when we found out we were embroiled in a long-standing battle, a centuries-old feud that makes the conflict between Lannisters and Starks look like child’s play. We can say it is the dispute between the Capulets and the Montagues of Test Driven Development (TDD).

The Classicists vs. The Mockists

Ok, we may be overselling the rivalry a bit, but there are two schools of TDD, usually known by the cities where they were supposedly created: London and Detroit.

The London style is top-down: you start building your software from the most generic component and then refine the abstractions you depend on.

If you develop using the London TDD style, you have no choice but to mock your dependencies on your tests: what you’re testing depends on code that has not been written yet.

Meet the contender representing London, the Mockist Sandi “Ruby” Metz!

In a talk at 2013’s RubyConf, Sandi gave several simple and straight pointers on how to write good unit tests.

One such aphorism we particularly like is “be a minimalist.” We recommend watching the talk if you haven’t before, but in case you’re in a rush, it can be summarized by this table:

(Image retrieved from https://youtu.be/URSWYvyc42M)

What Sandi means by minimalism is: don’t test what you don’t need to be testing.

From an OO perspective, objects communicate with each other via messages. A Mockist would say: “only care for the messages! Test the message, not the messenger!”

Sandi says you can ignore any of the messages that do not have an impact outside of your unit. Because remember, your unit is already the smallest testable part of your code.

For everything else, you can replace your dependencies with mocks because all you care about are the messages they exchange, i.e. that they respect their defined interface.

But could we be minimalistic with our mocks? The Detroit style of TDD works bottom-up: you start building and testing the smallest components and then develop more complex components from those smaller parts.

Unlike the London style, your dependencies are already there when you start testing, so you don’t need to mock them. In fact, there’s no need to mock anything! Right? Riiiight?

More or less. Meet the contender representing Detroit, the Classicist Robert “Uncle Bob” Martin!

What Uncle Bob suggests is to mock across “architecturally significant boundaries.” This quote is the knockout move when it comes to the use of test doubles.

Tests should be repeatable and quick to run so you should never depend on a faulty server connection or access to a database.

But you shouldn’t need to mock every class you depend on either, especially if you’re responsible for maintaining both and they are tightly coupled.

Clean up your act

What we found out, as we rewrote our tests to make less extensive use of test doubles (and we’re hardly the first ones to point this out), is that the need for mocks is actually a code smell.

Mocks represent the dependencies of your unit. So if the reason you want to write mocks is that your unit has a lot of dependencies, or your unit’s dependencies have a lot of dependencies, then mocking is not the way to go.

If you want to simplify your tests, don’t use doubles — rewrite your code instead.

Ever since we decided to adopt an “avoid mocks” mindset, we have been finding opportunities to make our code better.

We’ll give you a recent example, in which we were adding a Menu class to an already messy bundle of interdependent units:

An arrow represents the dependency; in yellow the last class to be added (diagram by Talkdesk)

To test the Menu class in a mock-free way, we would have had to instantiate not only Navigation but all its dependencies.

That is a hard test to build (the diagram is already a simplification of reality), and an easy solution would be just to mock Navigation — the mock can hide all the other dependencies, and the Menu itself only depended on Navigation.

However, that would be sweeping tech debt under the rug. Bad code is rarely the result of one day of bad programming by a single coder. More often it is the bastard child of small increments by a growing team of well-intentioned programmers that progressively lose track of the overall picture.

It may be because they don’t feel confident rewriting the existing code or because the problem was not apparent.

But avoiding test doubles will make the problem visible and give you more confidence to refactor old code.

Instead of mocking Navigation, we decided to figure out what those classes had in common and use an independent state store to manage it.

So, in the end, our dependency diagram looked like this:

Now, hello handsome (diagram by Talkdesk)

Now the only class we had to mock, no matter which unit we were testing, was the Location State Store. It’s a pretty dumb class that has only one job: to be the source of truth of the common location state.

Immediately our tests became a lot simpler. Even without relying on mocks, in the long run, this code will become easier to extend, debug and maintain.

Avoiding test doubles forces our units to have a single and well-defined responsibility — no test doubles means bad code will be twice as hard to test.

It also makes us feel safer when refactoring it because when we change any given unit, the tests of the other units that depend on it will also break until we fix them as well.

It is an art

And by art we mean, there’s no strict rule that can replace experience.

Here at the Atlas team, we are fans of the Kent C. Dodds’ React Testing Library. He wrote: “write tests, not too many, mostly integration.”

The sentiment that drove the decision to replace Enzyme with Kent C. Dodds’ library is the same that motivated this article. By avoiding mocks, we are indeed blurring, if ever so slightly, the line between integration and unit tests.

With Enzyme you’re testing bits of React (using shallow rendering), rather than testing the actual code React cobbles together, and your application runs on.

The same way, when using mocks, you’re making your tests hinge on a mere reflection of the actual code it depends on.

You can change the behavior or the interface of a class, but if you forget to keep all the mocks consistent, its dependencies will still pass the tests, because they are being tested against a mock. You get false positives.

We are not, however, about to dismiss all mocks: we’re striving for balance.

With experience, meaning art, Classicists and Mockists will come to the same conclusion. Test doubles sometimes are necessary, sometimes they are a pain.

You can read Kent C. Dodds advocating for mocks, reminding us that “when you mock something, you’re making a trade-off”, and you can watch Sandi Metz (in the aforementioned talk) sharing how hard it is to maintain mocks and avoiding API drift. Whatever side you end up choosing, be smart about it.

Keep this in mind

We prescribe reduced use of test doubles while recognizing they are important. We’ll stress again Uncle Bob’s suggestion of mocking across “architecturally significant boundaries.”

Take this into account:

Use the full extent of existing test doubles. Not only mocks but stubs. And very often spies. Use the simplest and least compromising test double that does the trick.
Mock network calls, database accesses, external libraries whose behavior you don’t want your SUT to depend strictly on. Mock anything that is unreliable or will make your tests run for too long.
If your test is getting too complicated to write, it may be a code smell. Take a look at your code, check if there’s any fault to it, and demand a refactor — whether it is adhering to SOLID principles, for example. But then, if you’re convinced the problem is not architectural, use test doubles.

The cost of writing tests should not dwarf what you get from them, and hopefully, unit tests are only a layer of the stack of tests you’re running. Just be wary of spherical cows.

그래서 목 객체로 한다면?
“So what if we mock?”
어떤 맥락(낙)농업
Some cowntext
SUT 업 : 몇몇 단위 테스트 개념
SUT up: some Unit Test concepts

더블 트러블 – Mocking 으로부터 벗어난 이유 (Double trouble — why we decided against mocking 번역) (작업중)