(I'm still working on getting code snippets to look good in these posts. Bear with me.)
First, we'll create some fake data by randomly creating 100 request records for each of three students, each consisting of an application and an action:
import numpy as np
import pandas as pd
application = ['files', 'discussion', 'assignment', 'quiz']
action = ['show', 'index', 'create']
names = np.array(['alice', 'bob', 'charlie'])
student = np.repeat(names, [100, 100, 100], axis=0)
applications = np.random.choice(application, 300, replace=True)
actions = np.random.choice(action, 300, replace=True)
df = pd.DataFrame({'id':student, 'web_application':applications, 'web_action':actions})
df.head()
This gives us the following:
Next, we create a create a table of the count of requests per student, using Pandas crosstab:
df_cross = pd.crosstab([df.id, df.web_action], df.web_application)
df_cross
This results in:
This was what I wanted, but I couldn't figure out a good way to plot the results. After lots of searching, I came across an answer (I'll try to track down the original) which involved "flattening" or "tidying" the table, using Pandas melt function (which is just like R's melt). Note that we have to reset the index as part of the process.
df_cross_melt = pd.melt(df_cross.reset_index(), id_vars=['id', 'web_action'], var_name='web_application', value_name = 'count')
df_cross_melt.head()
From there, it is possible to use Seaborn's factorplot to visualize the results:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
g = sns.factorplot(x='web_application',
y='count',
hue='web_action',
col='id',
kind='bar',
data=df_cross_melt,
sharey=True)
g.set_xticklabels(rotation=90)
This produces the following chart:
I'm sure that there are much more efficient ways to do this, but this worked for me and got me the result that I needed.

No comments:
Post a Comment