Blog Post:

The Shape of Data

6 minutes

Phil Renaud

Medium, Method, and Message

At Affinio, we’ve been building data tools for the better part of seven years now: platforms to help process, make sense of, and act upon complex datasets and social graphs. While every dataset is different, we’ve noticed some common threads that are worth discussing.

There’s an idea that data can be used to justify hypotheses that one has previously established. This can be an appealing thought: data helps you tell your already-decided story, lending it a sense of infallibility. But, we’ve come to consider this a shallow use of the sorts of insights we can extract from data: that it should be used as validation alone.

A more meaningful endeavor, we’ve found, is to approach the data from an exploratory perspective: to let it tell its story, regardless of our prior biases. Our decisions around new features for Affinio tend to reflect our drive to allow users to act in this way: tools like baselining, cluster expansion, and trait graphs all support open-ended exploration. Even our graph visualization itself is designed to be exploratory: a typical clustered report’s graph has many nuances that all point to a discovery to be made about the clusters themselves.

However, novelty in visualization is not good for its own sake; we want to make sure that the exploratory interfaces we create help foster better understanding of data. Given, especially, the many subjective interpretations that one can arrive at by cherry-picking pieces of a dataset, we must also be careful to provide as all-encompassing and objective a view as possible.

Balancing this desire for objectivity and a drive to let users explore data deeply pulls us between very simple representations of data (say, as a spreadsheet) and very complex ones (a network graph). There is still a wide range of options available to us to shape data in this way, but it seems clear that the Medium we use to visualize is often just as important as the Message, the data itself, when it comes to exploring insights.

Understanding Data in the Age of Busyness

Interpreting and responding to data is a more time-sensitive job than it used to be.

In the financial sphere, for example, reactive transactions have long removed the human element for want of speed (and consistency!), and in stock exchanges around the world, algorithmic trading has been the norm for years.

Personal judgment still has a role in high-leverage and strategic decisions — anything where the parameters aren’t as cut and dry as day trading — but with each passing year, the percentage of decisions we cede to machine-intelligence rises. The reason seems clear: the margin on arriving at decisions based on lengthy and deep human understanding, compared to automating the same, grows thin in inverse proportion to the performance of our best neural networks and models.

Where we still value human judgment, in those strategic decisions made in boardrooms and C-suites across the world, we’d do well to concentrate on improving our ability to understand data more quickly. Exposing data in 2020 isn’t hard — our data-lakes-and-warehouses provide us with a grand supply – but data on its own isn’t useful: Data is only valuable insofar as it is understood. And, if these tools and technologies allow the depth of our data to be exposed in ways it hasn’t been in the past, the sole best potential for us to improve our decision-making today comes down to the speed at which we can come to our conclusions.

Our goal in building a data tool, therefore, should be enabling its users to arrive at an understanding of their data, not just deeply, but quickly. It’s with this goal in mind that we’re very excited to release our latest product: Affinio Express.

It’s a mobile-first querying engine that leverages our graph and relevance technology to let users answer on-the-fly questions from their phones and laptops. Affinio has been synonymous with Clustering data for as long as we’ve been around, and while that strategy remains important for digging deep into the hidden niches of communities and datasets, a vast number of questions can be answered by way of comparative querying. Even better, now you can answer those questions at an unexpected speed. If you haven’t read Jackie’s excellent introduction to Express, you can read it here.

The Shape of Data

Affinio Express launches with the ability to work across any dataset, just as Affinio’s clustering offering does. Additionally, we provide the Twitter social graph as a datasource that any of our users can leverage today — a familiar dataset that works nicely with both our Express and Clustering tools.

We’ve split the tools inside Express into what we call workflows: rules, outputs, and actions surrounding specific activities you’d like to do with your data. For example, perhaps with Twitter, you’d like to learn about the Influential accounts that your brand’s followers also engage with: we provide a Persona Creation workflow that helps with exactly that.

This workflow mimics our clustering platform, with an exception worth noting: instead of taking hours, this takes in the order of 1 to 2 seconds. We shape the output in such a way that, if you were being asked to describe the followers and their interests of a brand, a clear and unambiguous account is presented: this is how many people match our query, this is where they live, this is how they self-describe, and these are the things they pay attention to.

For another use-case, we’re often tasked to compare and explain the difference between communities that occur within a dataset — for example, on Twitter, surfers and hikers. We have two ways to shape this data in the Compare workflow: first, as a distinct list of those traits that appear explicitly in either community and of all those terms most relevant to each of them in order of bias between them. Take a look at these examples:

We think this workflow is notable for its speed, clarity, and conciseness. While there is a whole world of traits that could appear between any two given communities online, we’re using our technology to very quickly ensure we only expose those most relevant traits: that is, we consider not only how these communities are distinguished from one another, but also from Twitter as a whole.

Another way to visualize, or shape, a comparative query can be found in our Discovery workflow. We think this interface meets the criteria set out above: it should be both exploratory and objective. Running a query in Discovery means that you’re looking to understand a trend or niche more deeply than an ordered list would provide.

Discovery uses a visualization called a Bee Swarm plot to fill your screen with relevant traits to explain the difference and commonalities between communities. Size and position are encoded with meaning: traits are larger when they occur more frequently between the audiences, and they group together vertically in accordance with their bias. If you haven’t already had the chance, check out Ryan Hogg’s brief look into some of the shapes that emerge in this workflow.

We think that, with Express and its workflows, we’re able to tap into something fundamentally new for complex data: that it can be both exploratory and decisive. It gives you very fast results, and at the same time, leads you to dig deeper and ask further questions.

You can try it at — we think you’ll love it.