Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Basic DataFrame Analysis

  • sum

  • min

  • max

  • median

  • mean

  • count

  • describe

Check the official docs,

import pandas as pd
houses = pd.read_csv("data/kc_house_data.csv")
houses
Loading...

Min, Max

houses.min()
id 1000102 date 20140502T000000 price 75000.0 bedrooms 0 bathrooms 0.0 sqft_living 290 sqft_lot 520 floors 1.0 waterfront 0 view 0 condition 1 grade 1 sqft_above 290 sqft_basement 0 yr_built 1900 yr_renovated 0 zipcode 98001 lat 47.1559 long -122.519 sqft_living15 399 sqft_lot15 651 dtype: object
houses.max()
id 9900000190 date 20150527T000000 price 7700000.0 bedrooms 33 bathrooms 8.0 sqft_living 13540 sqft_lot 1651359 floors 3.5 waterfront 1 view 4 condition 5 grade 13 sqft_above 9410 sqft_basement 4820 yr_built 2015 yr_renovated 2015 zipcode 98199 lat 47.7776 long -121.315 sqft_living15 6210 sqft_lot15 871200 dtype: object
type(houses.max())
pandas.core.series.Series
houses.sum()
id 98994056770455 date 20141013T00000020141209T00000020150225T0000002... price 11672925008.0 bedrooms 72854 bathrooms 45706.25 sqft_living 44952873 sqft_lot 326506890 floors 32296.5 waterfront 163 view 5064 condition 73688 grade 165488 sqft_above 38652488 sqft_basement 6300385 yr_built 42599334 yr_renovated 1824186 zipcode 2119758513 lat 1027915.4151 long -2641408.943 sqft_living15 42935359 sqft_lot15 275964632 dtype: object
houses.sum(numeric_only=True)
id 9.899406e+13 price 1.167293e+10 bedrooms 7.285400e+04 bathrooms 4.570625e+04 sqft_living 4.495287e+07 sqft_lot 3.265069e+08 floors 3.229650e+04 waterfront 1.630000e+02 view 5.064000e+03 condition 7.368800e+04 grade 1.654880e+05 sqft_above 3.865249e+07 sqft_basement 6.300385e+06 yr_built 4.259933e+07 yr_renovated 1.824186e+06 zipcode 2.119759e+09 lat 1.027915e+06 long -2.641409e+06 sqft_living15 4.293536e+07 sqft_lot15 2.759646e+08 dtype: float64
titanic = pd.read_csv("data/titanic.csv")
titanic.head()
Loading...
titanic.sum()
pclass 3004 survived 500 name Allen, Miss. Elisabeth WaltonAllison, Master. ... sex femalemalefemalemalefemalemalefemalemalefemale... age 290.91672302548633953714718242680?245032363747... sibsp 653 parch 504 ticket 2416011378111378111378111378119952135021120501... fare 211.3375151.55151.55151.55151.5526.5577.958305... cabin B5C22 C26C22 C26C22 C26C22 C26E12D7A36C101?C62... embarked SSSSSSSSSCCCCSSSCCCCSSCCSCCCSSSCSSSCSSSCCCSCCS... boat 211???310?D??496B??68A55548?778D?788?469???6D8... body ???135?????22124??????????????148?????????????... home.dest St Louis, MOMontreal, PQ / Chesterville, ONMon... dtype: object
titanic.sum(numeric_only=True)
pclass 3004 survived 500 sibsp 653 parch 504 dtype: int64
names = ['sumlev', 'region', 'division', 'state', 'name', 'census2010pop', 'estimatesbase2010', 'popestimate2010', 'popestimate2011', 'popestimate2012', 'popestimate2013', 'popestimate2014', 'popestimate2015', 'popestimate2016', 'popestimate2017', 'popestimate2018', 'popestimate2019', 'popestimate042020', 'popestimate2020']
state_pops = pd.read_csv("data/nst-est2020.csv", names=names, header=0)
state_pops.tail(52).head(51).sum(numeric_only=True)
sumlev 2040 state 1477 census2010pop 308745538 estimatesbase2010 308758105 popestimate2010 309327143 popestimate2011 311583481 popestimate2012 313877662 popestimate2013 316059947 popestimate2014 318386329 popestimate2015 320738994 popestimate2016 323071755 popestimate2017 325122128 popestimate2018 326838199 popestimate2019 328329953 popestimate042020 329398742 popestimate2020 329484123 dtype: int64
netflix = pd.read_csv("data/netflix_titles.csv", sep="|", index_col=0)
netflix.head()
Loading...
netflix.count()
show_id 8807 type 8807 title 8807 director 6173 cast 7982 country 7976 date_added 8797 release_year 8807 rating 8803 duration 8804 listed_in 8807 description 8807 dtype: int64

Mean, median, mode

houses.mean(numeric_only=True)
id 4.580302e+09 price 5.400881e+05 bedrooms 3.370842e+00 bathrooms 2.114757e+00 sqft_living 2.079900e+03 sqft_lot 1.510697e+04 floors 1.494309e+00 waterfront 7.541757e-03 view 2.343034e-01 condition 3.409430e+00 grade 7.656873e+00 sqft_above 1.788391e+03 sqft_basement 2.915090e+02 yr_built 1.971005e+03 yr_renovated 8.440226e+01 zipcode 9.807794e+04 lat 4.756005e+01 long -1.222139e+02 sqft_living15 1.986552e+03 sqft_lot15 1.276846e+04 dtype: float64
titanic.mean(numeric_only=True)
pclass 2.294882 survived 0.381971 sibsp 0.498854 parch 0.385027 dtype: float64
houses.median(numeric_only=True)
id 3.904930e+09 price 4.500000e+05 bedrooms 3.000000e+00 bathrooms 2.250000e+00 sqft_living 1.910000e+03 sqft_lot 7.618000e+03 floors 1.500000e+00 waterfront 0.000000e+00 view 0.000000e+00 condition 3.000000e+00 grade 7.000000e+00 sqft_above 1.560000e+03 sqft_basement 0.000000e+00 yr_built 1.975000e+03 yr_renovated 0.000000e+00 zipcode 9.806500e+04 lat 4.757180e+01 long -1.222300e+02 sqft_living15 1.840000e+03 sqft_lot15 7.620000e+03 dtype: float64
titanic.median(numeric_only=True)
pclass 3.0 survived 0.0 sibsp 0.0 parch 0.0 dtype: float64
titanic.mode(numeric_only=True)
Loading...

Describe

titanic.describe()
Loading...
houses.describe()
Loading...
titanic.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 14 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   pclass     1309 non-null   int64 
 1   survived   1309 non-null   int64 
 2   name       1309 non-null   object
 3   sex        1309 non-null   object
 4   age        1309 non-null   object
 5   sibsp      1309 non-null   int64 
 6   parch      1309 non-null   int64 
 7   ticket     1309 non-null   object
 8   fare       1309 non-null   object
 9   cabin      1309 non-null   object
 10  embarked   1309 non-null   object
 11  boat       1309 non-null   object
 12  body       1309 non-null   object
 13  home.dest  1309 non-null   object
dtypes: int64(4), object(10)
memory usage: 143.3+ KB
titanic.describe(include=["object"])
Loading...
titanic.describe(include=["O"]) # another way to write "object"
Loading...