Technical Blog

Appreciating the Opposition: From Fencing to Feedback

Posted on Friday, August 22nd, 2014 by

My fencing master used to tell me that the quickest way to get to know somebody is to fight him or her. Thanks to him, I somehow managed to stay alive through several duels, and I can now reflect on his advice as it relates to my job: giving and receiving feedback. On the surface, feedback and fencing might seem pretty different. But similar to a duel in fencing, exchanging feedback

Read more ›

Making Friends With Types

Posted on Thursday, August 14th, 2014 by

On August 8th and 9th I had the pleasure of attending the Scala by the Bay conference as a speaker. In my talk, “Reasoning With Types,” I discussed how we, as developers, can approach types not as something “just for the compiler” but rather as a tool for us to reason about code. A good chunk of Box’s codebase is written in Scala—a programming language exceptional in its ability to

Read more ›

Tags: Scala types

A Tale of Postmortems

Posted on Monday, August 11th, 2014 by

Site issues are a part of life for most web application shops. Database errors, buggy code, vendor failures, growing pains, etc. rear their heads and keep engineers up at night. At Box, we’re no exception, and over the years we’ve done our fair share of triaging and solving site issues. This is a story about the evolution of site outages at Box, a grassroots campaign to scorch our tech...

Read more ›

Tags: ops performance postmortems

Apache Spark in Resource Constrained States

Posted on Tuesday, July 29th, 2014 by

This is a two part series, read part one Evaluating Apache Spark and Twitter Scalding. Extending the work done here, we sought to evaluate Spark* in resource-constrained clusters. We used the same benchmarks found in the previous blog post while manipulating cluster configurations to validate claims of graceful degradation of RDD’s. The cluster set-up is similar to the previous post, with one master...

Read more ›

Tags: Apache Spark benchmarks Hadoop

Evaluating Apache Spark and Twitter Scalding

Posted on Thursday, July 24th, 2014 by

In general, there are two classes of frameworks to consider when building a machine learning system, in-memory and disk-based frameworks. Disk-based frameworks such as Hadoop MapReduce persist intermediate values to disk, allowing for the computation of massive data with less risk of running out of working memory. In-memory frameworks, on the other hand, attempt to circumvent the heavy cost of I/O...

Read more ›

Tags: Apache Spark benchmarks Hadoop MapReduce