DISCUSSION: Data for pandas examples #150

datapythonista · 2019-08-21T16:24:19Z

Very often in the pandas documentation, to show examples simple DataFrame objects are created. And many of them just use random data, see for example https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#id1

>>> df = pandas.DataFrame(numpy.random.randn(5, 2), columns=list('AB'))
>>> df
          A         B
0  0.469112 -0.282863
1 -1.509059 -1.135632
2  1.212112 -0.173215
3  0.119209 -1.044236
4 -0.861849 -2.104569

Then, if I want to show an operation, I can get something like:

>>> df @ 2
          A         B
0  2.469112  1.717137
1  0.490941  0.864368
2  3.212112  1.826785
3  2.119209  0.955764
4  1.138151 -0.104569

And in my opinion the example is quite useless (more than for the syntax), because if you don't know what the operation does, the example is not helping you understand.

The best example I could find to overcome that (probably not great, but the best I could find) is:

>>> df = pandas.DataFrame({"num_legs": [4, 4, 2],
...                        "num_arms": [0, 0, 2]},
...                       ["dog", "cat", "monkey"])
>>> df
        num_arms  num_legs
dog            0         4
cat            0         4
monkey         2         2

Then, when performing an operation is easy to guess what it's doing, or double check if you already have a guess:

>>> df @ 2
        num_arms  num_legs
dog            2         6
cat            2         6
monkey         4         4

We are already using some of those in some examples: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename_axis.html

While this worked well in some places, we found this dataset very insufficient to show all pandas functionality. And while we initially wanted to standardize the data used in the examples, so things are easier for recurring users, we finally forgot about it.

But while it's surely not simple, I think it'd be ideal if we could find a very reduced amount of datasets that can be used in all pandas examples. The ones I think we surely need are:

A simple example like the one proposed
One with MultIndex (probably in both axis)
A timeseries dataset

If we're able to find the ones we need, I think it'd also be great if we could have something like:

>>> import pandas
>>> animals = pandas.sample_data('animals')
>>> animals
        num_arms  num_legs
dog            0         4
cat            0         4
monkey         2         2

That should make the examples much simpler, and directly show the point they are trying to show. See for example the MultiIndex example here, how creating the DataFrame distracts from the operation shown: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

@python-sprints/pandas-mentoring thoughts? Ideas on datasets?

The text was updated successfully, but these errors were encountered:

martinagvilas · 2019-08-22T08:46:15Z

@datapythonista I really like this idea! As a pandas user, I completely agree that its difficult to understand the documentation of some functions when the example is based on random numbers.

One question: are you thinking of using examples already uploaded somewhere in the internet, or creating the examples ourselves? (or are we open to both ideas?)

datapythonista · 2019-08-22T09:18:27Z

I'm open to any idea. But my personal opinion is that will make things easier for users if we use datasets as small as possible. Showing .head() will require datasets longer than 10 rows, and df.loc[1, 'col'] = 0. will probably just require a 3x2 dataset, so small as possible depends on the case. But I don't think it makes sense to use the whole titanic dataset to show any of that.

bhavaniravi · 2019-08-22T14:02:00Z

I am thinking of a central theme and multiple datasets around it. That will make the documentation read like a story for different use cases.

For eg., A students dataset, A HR system or A sales system.

I have a pandas article in a similar fashion for your reference.
https://medium.com/bhavaniravi/python-pandas-tutorial-92018da85a33
https://medium.com/bhavaniravi/learn-pandas-via-usecases-part-2-e1503892191b

martinagvilas · 2019-08-25T16:13:45Z

Based on the idea of @bhavaniravi of keeping the same theme across the three different types of dataframes that @datapythonista mentioned, I came up with the following examples (ignore the actual numbers in the dataframe, they won't make sense):

We could create a multiindex dataframe taking as example a company with multiple branches in differents cities, that wants to know how each of their sales departments performed in 4 profit variables (units sold, total revenue, cost of produced goods and operating costs), measured yearly since 2013:

Year                                       2013                                                               2014                                                               2015                                                               2016                                                               2017                                                               2018                                                        
Profit                               Units sold Total sales revenue Costs of goods sold Operating costs Units sold Total sales revenue Costs of goods sold Operating costs Units sold Total sales revenue Costs of goods sold Operating costs Units sold Total sales revenue Costs of goods sold Operating costs Units sold Total sales revenue Costs of goods sold Operating costs Units sold Total sales revenue Costs of goods sold Operating costs
City           Department                                                                                                                                                                                                                                                                                                                                                                                                                             
New York       Women's clothing            38.0                34.7                44.0            36.1       40.0                35.9                27.0            34.9       52.0                38.0                43.0            37.0       40.0                36.2                19.0            37.4       41.0                35.8                46.0            35.5       36.0                36.6                31.0            36.7
               Men's clothing              48.0                35.1                44.0            38.6       44.0                38.2                32.0            36.4       32.0                38.6                33.0            35.5       41.0                36.7                33.0            36.9       54.0                37.3                51.0            36.5       33.0                37.0                28.0            37.2
               Kid's clothing              31.0                37.8                45.0            36.1       19.0                37.1                46.0            37.4       45.0                37.2                25.0            36.4       46.0                36.3                46.0            38.0       36.0                36.1                51.0            36.0       32.0                36.0                38.0            39.1
               Shoes                       36.0                37.6                48.0            36.3       35.0                37.6                33.0            38.5       46.0                38.0                25.0            36.6       29.0                38.1                29.0            36.4       33.0                37.2                22.0            35.5       30.0                36.7                23.0            34.6
               Handbags & Suitcases        44.0                37.0                50.0            35.6       44.0                34.8                51.0            35.4       47.0                37.1                36.0            36.4       19.0                36.9                31.0            36.3       50.0                37.2                42.0            38.4       39.0                37.7                42.0            35.3
               Jewellery                   26.0                37.1                37.0            36.4       38.0                36.6                36.0            39.2       46.0                35.9                32.0            37.7       26.0                36.3                39.0            37.1       36.0                39.1                39.0            36.5       44.0                38.3                31.0            37.7
               Drugstore                   26.0                35.1                27.0            36.7       36.0                35.8                37.0            37.5       28.0                38.5                33.0            36.0       33.0                37.0                30.0            36.8       38.0                34.9                29.0            36.1       62.0                37.9                40.0            36.7
               Toys                        44.0                37.1                43.0            38.7       35.0                38.7                37.0            36.9       27.0                36.6                28.0            36.2       54.0                34.7                24.0            37.7       41.0                37.5                36.0            37.6       47.0                37.7                39.0            36.6
               Sports                      53.0                36.3                33.0            37.3       29.0                36.0                50.0            37.0       30.0                35.6                26.0            39.0       43.0                35.9                33.0            37.4       35.0                36.4                38.0            38.1       36.0                36.2                48.0            36.3
               Home appliances             20.0                36.7                31.0            36.1       35.0                36.8                33.0            37.9       22.0                37.2                30.0            38.3       59.0                38.2                52.0            36.1       39.0                37.8                41.0            38.8       56.0                35.3                53.0            38.7
               Home decoration             21.0                36.6                20.0            36.5       27.0                36.9                54.0            36.2       26.0                36.3                48.0            37.1       48.0                37.2                37.0            38.6       49.0                36.2                33.0            36.4       22.0                36.6                44.0            36.8
               Electronics                 23.0                36.7                19.0            36.9       35.0                36.2                36.0            36.9       27.0                36.8                36.0            37.2       53.0                36.9                35.0            38.0       24.0                36.8                30.0            36.8       43.0                38.1                40.0            37.0
               Books & Office supply       32.0                37.2                38.0            37.9       42.0                36.7                21.0            36.9       40.0                35.9                41.0            37.7       31.0                37.3                52.0            36.7       50.0                38.6                56.0            37.3       44.0                38.4                30.0            36.5
               Gourmet food & Wine         58.0                38.6                37.0            36.6       35.0                36.8                34.0            35.6       47.0                35.5                51.0            35.6       46.0                38.2                46.0            37.1       24.0                35.9                25.0            37.7       41.0                37.1                40.0            36.0
Paris          Women's clothing            33.0                37.9                13.0            37.7       56.0                37.4                29.0            37.1       36.0                36.4                26.0            37.2       42.0                37.0                49.0            36.3       21.0                37.0                35.0            36.7       56.0                36.5                44.0            37.8
               Men's clothing              29.0                35.7                22.0            37.5       48.0                39.1                24.0            36.3       48.0                37.9                50.0            36.2       60.0                36.9                33.0            36.3       37.0                36.2                19.0            37.0       30.0                37.2                22.0            38.3
               Kid's clothing              17.0                36.7                26.0            38.7       31.0                35.0                38.0            37.0       46.0                36.6                42.0            37.6       22.0                38.9                37.0            35.9       23.0                36.7                34.0            36.4       24.0                36.2                36.0            37.1
               Shoes                       55.0                37.0                18.0            37.1       28.0                38.7                35.0            36.1       24.0                36.9                45.0            37.6       52.0                36.4                36.0            39.3       35.0                37.1                49.0            37.9       41.0                37.6                13.0            34.8
               Handbags & Suitcases        25.0                36.2                27.0            37.0       37.0                36.6                38.0            37.6       48.0                36.0                16.0            37.0       29.0                36.6                40.0            36.3       36.0                37.2                57.0            36.0       26.0                38.1                25.0            37.0
               Jewellery                   40.0                36.6                35.0            37.1       37.0                35.5                46.0            36.0        9.0                36.7                51.0            38.2       40.0                37.4                38.0            37.9       23.0                36.8                31.0            37.8       20.0                38.3                33.0            36.7
               Drugstore                   10.0                37.0                54.0            38.4       35.0                37.3                38.0            36.4       51.0                35.5                24.0            37.6       35.0                37.8                26.0            36.7       48.0                38.4                24.0            37.5       30.0                36.9                38.0            37.7
               Toys                        25.0                38.3                49.0            38.7       51.0                37.4                35.0            38.4       32.0                36.4                32.0            36.1       44.0                36.1                44.0            37.0       40.0                37.3                25.0            37.6       46.0                37.2                38.0            37.2
               Sports                      37.0                37.8                20.0            36.6       19.0                38.2                34.0            36.7       52.0                37.3                26.0            35.4       32.0                36.1                33.0            38.4       44.0                37.1                41.0            37.4       22.0                38.8                26.0            35.5
               Home appliances             47.0                36.9                32.0            37.6       25.0                35.5                37.0            36.4       42.0                36.9                31.0            36.0       32.0                37.5                34.0            36.9       41.0                35.3                26.0            35.8       32.0                37.1                19.0            36.1
               Home decoration             48.0                36.7                44.0            36.8       40.0                36.8                42.0            37.6       33.0                37.1                48.0            36.3       45.0                38.6                51.0            37.7       47.0                38.1                44.0            38.5       27.0                37.7                37.0            37.6
               Electronics                 25.0                37.1                43.0            38.1       21.0                36.8                31.0            36.5       29.0                37.5                32.0            36.8       20.0                36.7                24.0            37.0       45.0                37.0                38.0            37.6       45.0                36.4                26.0            36.5
               Books & Office supply       39.0                36.9                32.0            36.1       24.0                38.6                47.0            39.0       46.0                36.5                30.0            37.6       55.0                36.7                38.0            38.3       31.0                37.2                31.0            36.6       45.0                36.7                51.0            36.9
               Gourmet food & Wine         32.0                36.7                50.0            38.9       25.0                37.4                29.0            37.1       40.0                36.0                30.0            39.9       25.0                36.1                24.0            37.2       36.0                38.3                35.0            37.2       63.0                39.1                22.0            38.2
Cape Town      Women's clothing            29.0                35.8                20.0            38.0       63.0                37.4                36.0            35.7       34.0                37.2                52.0            36.9       42.0                36.2                34.0            36.7       52.0                37.6                18.0            37.0       24.0                37.7                48.0            36.4
               Men's clothing              19.0                36.3                33.0            36.9       36.0                37.8                56.0            36.2       41.0                36.6                29.0            36.4       39.0                37.3                33.0            36.3       29.0                37.1                34.0            36.3       30.0                38.0                38.0            37.4
               Kid's clothing              40.0                36.4                26.0            36.8       41.0                35.4                35.0            38.1       41.0                37.8                26.0            38.5       37.0                37.8                29.0            35.8       36.0                36.3                39.0            37.5       25.0                37.2                34.0            35.5
               Shoes                       22.0                37.4                39.0            37.2       39.0                35.9                33.0            36.8       24.0                35.7                19.0            36.6       30.0                35.8                37.0            35.8       39.0                37.0                47.0            37.1       21.0                35.7                43.0            37.8
               Handbags & Suitcases        39.0                37.6                24.0            36.2       28.0                37.2                45.0            37.1       50.0                35.5                21.0            36.6       19.0                37.0                34.0            34.5       34.0                37.6                47.0            35.5       40.0                36.6                54.0            37.5
               Jewellery                   34.0                36.4                26.0            37.7       56.0                38.3                28.0            34.9       46.0                35.8                41.0            37.4       44.0                36.0                48.0            36.4       42.0                35.7                36.0            36.5       41.0                35.9                53.0            37.1
               Drugstore                   51.0                37.9                53.0            38.5       39.0                38.2                39.0            35.9       25.0                37.0                17.0            36.2       33.0                36.9                33.0            37.5       41.0                36.8                26.0            36.5       26.0                36.9                24.0            36.7
               Toys                        40.0                36.2                40.0            36.1       35.0                35.5                34.0            37.7       31.0                35.4                30.0            37.0       21.0                35.8                43.0            37.0       47.0                35.9                47.0            36.7       47.0                37.8                14.0            34.8
               Sports                      32.0                38.0                37.0            39.0       34.0                36.3                59.0            39.0       50.0                35.8                36.0            34.2       43.0                36.9                44.0            37.7       28.0                39.1                35.0            38.9       41.0                36.7                39.0            36.5
               Home appliances             34.0                36.4                40.0            36.1       54.0                37.7                35.0            36.2       46.0                37.0                33.0            37.0       48.0                38.0                23.0            36.6       22.0                37.2                42.0            37.4       39.0                37.5                37.0            39.4
               Home decoration             26.0                36.3                34.0            38.1       29.0                37.7                33.0            37.6       33.0                37.7                -1.0            37.6       41.0                38.0                41.0            36.2       40.0                35.9                30.0            35.8       39.0                37.9                27.0            36.0
               Electronics                 51.0                36.2                39.0            36.8       49.0                35.9                31.0            37.1       55.0                38.1                39.0            37.8       42.0                36.7                38.0            37.3       41.0                36.6                45.0            38.0       36.0                37.4                40.0            36.3
               Books & Office supply       44.0                36.4                28.0            35.5       29.0                36.5                39.0            37.2       35.0                37.1                45.0            35.7       47.0                38.0                35.0            38.0       28.0                38.3                15.0            37.5       33.0                37.3                32.0            35.9
               Gourmet food & Wine         41.0                38.3                45.0            37.5       37.0                38.8                34.0            37.7       51.0                35.8                26.0            37.7       48.0                34.9                47.0            36.5       28.0                38.5                34.0            36.3       29.0                38.4                31.0            38.1
Tokyo          Women's clothing            22.0                37.7                48.0            38.3       41.0                36.4                34.0            37.8       29.0                35.1                31.0            35.2       36.0                36.0                43.0            38.6       13.0                37.7                51.0            37.0       34.0                36.5                34.0            37.0
               Men's clothing              49.0                38.9                38.0            35.8       38.0                35.8                35.0            36.9       33.0                37.3                44.0            37.1       29.0                36.3                40.0            35.8       37.0                35.4                42.0            38.4       26.0                35.6                34.0            36.6
               Kid's clothing              30.0                37.9                39.0            37.2       45.0                38.8                51.0            37.0       19.0                37.5                40.0            37.9       36.0                36.5                41.0            38.3       34.0                37.2                48.0            36.0       23.0                37.0                20.0            37.1
               Shoes                       34.0                36.1                41.0            38.8       39.0                38.7                33.0            36.9       31.0                38.8                31.0            35.9       36.0                37.1                41.0            36.0       36.0                37.6                43.0            38.2       38.0                37.9                38.0            36.3
               Handbags & Suitcases        40.0                38.8                45.0            35.7       35.0                36.9                28.0            35.5       43.0                39.1                52.0            37.4       21.0                35.8                47.0            35.7       34.0                38.1                47.0            38.7       30.0                36.5                41.0            36.8
               Jewellery                   44.0                37.5                33.0            37.2       29.0                36.3                24.0            36.7       40.0                38.3                42.0            36.1       25.0                36.9                19.0            36.8       40.0                38.7                14.0            36.8       14.0                36.8                36.0            36.6
               Drugstore                   36.0                36.0                31.0            36.8       40.0                37.7                39.0            37.7       16.0                37.0                32.0            36.2       35.0                35.9                32.0            38.2       50.0                37.5                38.0            35.9       29.0                38.2                47.0            37.1
               Toys                        36.0                36.1                17.0            39.5       44.0                36.9                39.0            37.3       55.0                38.4                41.0            37.3       26.0                35.7                23.0            36.2       36.0                37.8                26.0            35.8       55.0                36.6                49.0            37.4
               Sports                      53.0                35.3                48.0            36.6       52.0                37.9                42.0            36.4       44.0                37.0                40.0            36.7       49.0                35.6                20.0            36.0       48.0                36.5                34.0            38.6       28.0                36.9                36.0            36.3
               Home appliances             23.0                36.6                33.0            35.3       44.0                37.8                48.0            38.5       47.0                37.1                47.0            37.3       39.0                36.7                43.0            36.7       34.0                36.0                37.0            37.3       17.0                37.6                14.0            38.4
               Home decoration             38.0                37.2                46.0            36.6       26.0                35.7                28.0            35.1       51.0                36.0                15.0            38.2       37.0                36.1                27.0            36.8       46.0                37.5                43.0            37.7       48.0                38.6                34.0            37.0
               Electronics                 46.0                36.0                50.0            36.6       29.0                36.8                30.0            36.5       55.0                37.5                45.0            37.0       38.0                36.4                63.0            36.2       36.0                37.7                44.0            36.0       35.0                36.7                37.0            36.5
               Books & Office supply       32.0                35.6                35.0            35.7       33.0                38.0                51.0            37.0       45.0                35.6                47.0            37.4       18.0                35.7                29.0            36.7       36.0                35.2                26.0            38.5       20.0                37.5                36.0            38.2
               Gourmet food & Wine         25.0                36.3                37.0            36.6       36.0                37.8                44.0            37.0       51.0                36.7                38.0            36.8       28.0                37.8                35.0            37.3       35.0                38.0                47.0            38.2       24.0                37.1                56.0            37.7

We could obtain a simple version of the dataframe by keeping one of these branches and its profit variables during a specific year, e.g.:

Profit                 Units sold  Total sales revenue  Costs of goods sold  Operating costs
Department                                                                                  
Women's clothing             41.0                 35.8                 46.0             35.5
Men's clothing               54.0                 37.3                 51.0             36.5
Kid's clothing               36.0                 36.1                 51.0             36.0
Shoes                        33.0                 37.2                 22.0             35.5
Handbags & Suitcases         50.0                 37.2                 42.0             38.4
Jewellery                    36.0                 39.1                 39.0             36.5
Drugstore                    38.0                 34.9                 29.0             36.1
Toys                         41.0                 37.5                 36.0             37.6
Sports                       35.0                 36.4                 38.0             38.1
Home appliances              39.0                 37.8                 41.0             38.8
Home decoration              49.0                 36.2                 33.0             36.4
Electronics                  24.0                 36.8                 30.0             36.8
Books & Office supply        50.0                 38.6                 56.0             37.3
Gourmet food & Wine          24.0                 35.9                 25.0             37.7

And finally, the timeseries data could be obtained by keeping only one of these variables across the years (in this example it would be units sold across the years):

Year                   2013  2014  2015  2016  2017  2018
Department                                               
Women's clothing       38.0  40.0  52.0  40.0  41.0  36.0
Men's clothing         48.0  44.0  32.0  41.0  54.0  33.0
Kid's clothing         31.0  19.0  45.0  46.0  36.0  32.0
Shoes                  36.0  35.0  46.0  29.0  33.0  30.0
Handbags & Suitcases   44.0  44.0  47.0  19.0  50.0  39.0
Jewellery              26.0  38.0  46.0  26.0  36.0  44.0
Drugstore              26.0  36.0  28.0  33.0  38.0  62.0
Toys                   44.0  35.0  27.0  54.0  41.0  47.0
Sports                 53.0  29.0  30.0  43.0  35.0  36.0
Home appliances        20.0  35.0  22.0  59.0  39.0  56.0
Home decoration        21.0  27.0  26.0  48.0  49.0  22.0
Electronics            23.0  35.0  27.0  53.0  24.0  43.0
Books & Office supply  32.0  42.0  40.0  31.0  50.0  44.0
Gourmet food & Wine    58.0  35.0  47.0  46.0  24.0  41.0

Of course it doesn't have to be this particular example :) I guess the main poiint is that from the most complex dataframe (the one with multi-index) you can derive the two others (the simple and the time-series ones)

galuhsahid · 2019-08-31T09:34:40Z

I agree @martinagvilas, I think it's also helpful to have the same dataset to represent a complex dataframe, a simpler one, and other variations. I imagine a lot of people would be interested in transforming one to the other as well.

datapythonista added the pandas label Aug 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DISCUSSION: Data for pandas examples #150

DISCUSSION: Data for pandas examples #150

datapythonista commented Aug 21, 2019

martinagvilas commented Aug 22, 2019

datapythonista commented Aug 22, 2019

bhavaniravi commented Aug 22, 2019

martinagvilas commented Aug 25, 2019

galuhsahid commented Aug 31, 2019

DISCUSSION: Data for pandas examples #150

DISCUSSION: Data for pandas examples #150

Comments

datapythonista commented Aug 21, 2019

martinagvilas commented Aug 22, 2019

datapythonista commented Aug 22, 2019

bhavaniravi commented Aug 22, 2019

martinagvilas commented Aug 25, 2019

galuhsahid commented Aug 31, 2019