Hemant Vishwakarma: How to make a bump chart

Sunday, 25 July 2021

How to make a bump chart

I have a table of ranked data that I'd like to visualise as a bump chart or a slope chart, e.g.

I have an idea of how to plot one, but if there's one thing I've learnt about pandas it's that there's usually some combination of melting, merging, foaming and fiddling that'll do the job in a one liner. AKA elegant pandas, not scrambling pandas.

The data looks a bit like this: (much more data here)

	ed_name	source
2562	edition_3	gq
2956	edition_8	warontherocks
10168	edition_12	aeon.co
1137	edition_14	hbr.org
4573	edition_13	thesmartnik
7143	edition_16	vijayboyapati.medium
9674	edition_15	medium
5555	edition_9	smh.au
8831	edition_11	salon
8215	edition_14	thegospelcoalition.org

and so on, where each row is an article, and the source is the place where that article came from. The goal is to find out, per edition, which sources contribute the most articles.

Here's my attempt to clumsily convert it to a bad bump chart:

all_sources = set(sources)
source_rankings = {}
for s in all_sources:
    source_rankings[s]={}

for ed in printed.groupby("ed_name"):
    df = ed[1]
    vc = df.source.value_counts()
    for i, x in enumerate(vc.index):
        source_rankings[x][ed[0]] = i+1
ranks = pd.DataFrame(source_rankings)

cols_to_drop = []
for name, values in ranks.iteritems():
    interesting = any([x>30 for x in list(values) if not math.isnan(x)])
    # print(name, interesting)
    if interesting:
        cols_to_drop.append(name)
only_interesting = ranks.drop(labels=cols_to_drop, axis='columns')

only_interesting.sort_index(
    axis=0, inplace=True, 
    key=lambda col: [int(x.split("_")[1]) for x in col],
    ascending=False
    )

linestyles = ['-', '--', '-.', ':']

plt.plot(only_interesting, alpha=0.8, linewidth=1)
plt.ylim(25, 0)
plt.gca().invert_xaxis()
plt.xticks(rotation=70)
plt.title("Popularity of publisher by edition")

editions_that_rank_threshold = 10
for name, values in only_interesting.iteritems():
    if len(values[values.isna() == False]) > editions_that_rank_threshold: 
        for i, x in values.iteritems():
            if not math.isnan(x):
                # print(name, i, x)
                plt.annotate(xy=(i,x), text=name)
                plt.plot(values, linewidth=5, linestyle=sample(linestyles,1)[0])
                break

plt.xlabel("Edition")
plt.ylabel("Cardinal Rank (1 at the top)")
plt.close()

Which gives something like:

Which, to say the least, leaves a lot to be desired. A lot of that can be solved by grinding away with standard matplotlib things, but I'm hesitant to do that as it feels inelegant, and there's probably a built in bumpchart method that I'm missing.

This question asks a similar question, but the answer solves it as a slope chart. They look great, but that's a different type of chart.

Is there a more elegant way to do this?

from How to make a bump chart

Hemant Vishwakarma

Sunday, 25 July 2021

How to make a bump chart

No comments:

Post a Comment