All About Pandas
First go though the Setup and click download below to download resource for this section
Download Resources
P A N D A S¶
Lets learn about Pandas, This package basically use for file processing and manipulation, Python also have inbuild file processing methods but Pandas did this process very simple and efficient and handy way so lets deep dive with Pandas.
So lets look at what we have in our directory first we are going to open this different type of files using pandas and try different thing on it.
In [1]:
import os
os.listdir()
Out[1]:
['.ipynb_checkpoints', 'all_about_pandas.ipynb', 'exercise.ipynb', 'supermarkets-commas.txt', 'supermarkets-commas_noheader.txt', 'supermarkets-semi-colons.txt', 'supermarkets.csv', 'supermarkets.json', 'supermarkets.xlsx']
Lets first import the pandas
In [2]:
import pandas
PANDAS SERIES¶
Lets talk about the pandas series first, actually series is the pandas data stucture which is use to hold the one dimensional labelled array of different data type it is build on top of Numpy array. syntax for creating series is
pandas.Series(data=None, index=None, dtype=None, name=None, copy=False)now lets explore this concepts
In [3]:
ds = pandas.Series([1,2,3,4,5,6])
ds
Out[3]:
0 1 1 2 2 3 3 4 4 5 5 6 dtype: int64
As we can see it prints kind of list of values in which first column represents Labels ie 0,1,2,3,4,5 and second column is the actual list but this is not a list type it's a special pandas object
In [4]:
type(ds)
Out[4]:
pandas.core.series.Series
we can also specify the index for Series and can also pass the list of python into series as data
In [5]:
ls = [1,2,3,4]
ds = pandas.Series(ls, index=[11,12,13,14])
# But keep in mind we have to pass the exact same number of index as value
ds
Out[5]:
11 1 12 2 13 3 14 4 dtype: int64
We can have index as we wants it could be char, int or float type
In [6]:
ds = pandas.Series(ls,index=["a","b","c","d"])
ds
Out[6]:
a 1 b 2 c 3 d 4 dtype: int64
Conversion: We can pass different type of data into series, such as list, tuple, dict such as
In [7]:
ls = ('Jeff Bezos', 'Elon Musk','Bernard Arnault', 'Bill Gates', 'Warren Buffett')
ind = [1,2,3,4,5]
ds = pandas.Series(ls, ind)
ds
Out[7]:
1 Jeff Bezos 2 Elon Musk 3 Bernard Arnault 4 Bill Gates 5 Warren Buffett dtype: object
Lets pass dic type data into series, In this we don't have to pass index it automatically fetch the dic key as index
In [8]:
dic = {"Amazon":'Jeff Bezos',
"Tesla":'Elon Musk',
"Louis Vuitton":'Bernard Arnault',
"Microdoft":'Bill Gates',
"Berkshire Hathaway":'Warren Buffett'}
ds = pandas.Series(dic)
ds
Out[8]:
Amazon Jeff Bezos Tesla Elon Musk Louis Vuitton Bernard Arnault Microdoft Bill Gates Berkshire Hathaway Warren Buffett dtype: object
We can also use index label to acces the perticular value such as
In [9]:
ds["Amazon"]
Out[9]:
'Jeff Bezos'
But we can also acces it by index number
In [10]:
ds[0]
Out[10]:
'Jeff Bezos'
We can also pass dataframe column into Series, lets first read a file from local
In [11]:
df=pandas.read_csv("supermarkets.csv")
df
Out[11]:
| ID | Address | City | State | Country | Name | Employees | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St | San Francisco | CA 94114 | USA | Madeira | 8 |
| 1 | 2 | 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 2 | 3 | 332 Hill St | San Francisco | California 94114 | USA | Super River | 25 |
| 3 | 4 | 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 4 | 5 | 1056 Sanchez St | San Francisco | California | USA | Sanchez | 12 |
| 5 | 6 | 551 Alvarado St | San Francisco | CA 94114 | USA | Richvalley | 20 |
Now lets make a Series of column name from dataframe df and also name this series as names
In [12]:
ds = pandas.Series(df["Name"], name="Names")
ds
Out[12]:
0 Madeira 1 Bready Shop 2 Super River 3 Ben's Shop 4 Sanchez 5 Richvalley Name: Names, dtype: object
Pandas Dataframe¶
Next very basic thing we need to know is Dataframe, Dataframe is object of pandas which store data in tabular form. as we used .Series to form a series object similarly we have .DataFrame() to ganerate dataframes, lets take a example
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
In [13]:
dic = {"Name":["Sam","John","Mack"],
"Age":[19,25,21],
"OCC":["Engineer","Doctor","Accountant"]
}
dic
Out[13]:
{'Name': ['Sam', 'John', 'Mack'],
'Age': [19, 25, 21],
'OCC': ['Engineer', 'Doctor', 'Accountant']}
In [14]:
df=pandas.DataFrame(dic)
df
Out[14]:
| Name | Age | OCC | |
|---|---|---|---|
| 0 | Sam | 19 | Engineer |
| 1 | John | 25 | Doctor |
| 2 | Mack | 21 | Accountant |
Pandas form a tabular representation of a dictionary variable but it is important to have equal number of items in the dictionary value lists as we can see in the above exaple, otherwise we will face an error. Pandas treat keys as column labels but row are automatically given by pandas as an integer we can change this row label but we will see it later in the session.
We can also form Dataframe using multi-dimensional list such as
In [15]:
ls=[['Sam', 'John', 'Mack'],[19, 25, 21],['Engineer', 'Doctor', 'Accountant']]
df=pandas.DataFrame(ls)
df
Out[15]:
| 0 | 1 | 2 | |
|---|---|---|---|
| 0 | Sam | John | Mack |
| 1 | 19 | 25 | 21 |
| 2 | Engineer | Doctor | Accountant |
But as we can see the column label automatically fetched as integer becouse we did not specify the labels, In Multi-dimensional array no need to keep all the list len() equal like we do with dictionary object, if one of the list have graeter len() then pandas add an extra column with None values, let see an exaple
In [16]:
ls=[['Sam', 'John', 'Mack','My Name'],[19, 25, 21],['Engineer', 'Doctor', 'Accountant']]
Consider above array, the first element of parent array have 4 elements other two have 3 elements, now try to make Dataframe of this array and see
In [17]:
df=pandas.DataFrame(ls)
df
Out[17]:
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 0 | Sam | John | Mack | My Name |
| 1 | 19 | 25 | 21 | None |
| 2 | Engineer | Doctor | Accountant | None |
We can see in the above Dataframe column 3 have only one value ie My Name and other is None
In [18]:
tp=(['Sam', 'John', 'Mack'],[19, 25, 21],['Engineer', 'Doctor', 'Accountant'])
df=pandas.DataFrame(tp)
df
Out[18]:
| 0 | 1 | 2 | |
|---|---|---|---|
| 0 | Sam | John | Mack |
| 1 | 19 | 25 | 21 |
| 2 | Engineer | Doctor | Accountant |
Using tuple as data is similar as using array,
It is not necessary to have multiple array or same type we will make DataFrame of one dimensional array
In [19]:
ls=["John",5,5.3]
df=pandas.DataFrame(ls)
df
Out[19]:
| 0 | |
|---|---|
| 0 | John |
| 1 | 5 |
| 2 | 5.3 |
Now lets explore how to do labeling of column and rows, lets first create a fresh dataframe
In [20]:
tp=(['Sam', 'John', 'Mack'],[19, 25, 21],['Engineer', 'Doctor', 'Accountant'])
df=pandas.DataFrame(tp)
df
Out[20]:
| 0 | 1 | 2 | |
|---|---|---|---|
| 0 | Sam | John | Mack |
| 1 | 19 | 25 | 21 |
| 2 | Engineer | Doctor | Accountant |
Ok, as both rows and Columns are pandas ganerated we can modify this labels, there is an parameter we have to pass this parameter to achive this lets see an exaple.
Before it have you noticed when we create Dataframe from dictionary then it ganerate dataframe in a way all the names are in one column, age in another and cca in another, but when we use the array or tuple then name are store in rows age in another rows and cca in other, so first we have arrange name, age and cca in columns instead of rows so the data make sense,
In [21]:
df1=df.T
df1
Out[21]:
| 0 | 1 | 2 | |
|---|---|---|---|
| 0 | Sam | 19 | Engineer |
| 1 | John | 25 | Doctor |
| 2 | Mack | 21 | Accountant |
Now we have all the names, age and cca in column, ok DataFrame.T is a method use to translate columns into Row
But in this when we have already assigned dataframe then we have one more fuction to change rows and column names that is
Syntax: DataFrame.rename(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None)
so lets see an example
In [22]:
df1.rename(index={0:"a",1:"b",2:"c"}, columns={0:"Name",1:"Age",2:"Cou"},inplace=True)
df1
Out[22]:
| Name | Age | Cou | |
|---|---|---|---|
| a | Sam | 19 | Engineer |
| b | John | 25 | Doctor |
| c | Mack | 21 | Accountant |
As we can see we have passed the index and cloumn name as dictionary, becouse rename method is not only use to rename all columns and rows we can changed one to many columns and rows name using this method thats why we have to pass existing column and rows name as dictionary keys so that pandas figure out which column name we want to change,
let try to change a single column name, lets change Cou to Occupation
In [23]:
df1.rename(columns={"Cou":"Occupation"}, inplace=True)
df1
Out[23]:
| Name | Age | Occupation | |
|---|---|---|---|
| a | Sam | 19 | Engineer |
| b | John | 25 | Doctor |
| c | Mack | 21 | Accountant |
Okay inplace parameter change the working dataframe it self, it's default set to False, means if we change anything without passing inplace as True then it will not change the working dataframe, let see an example
In [24]:
df1.rename(index={"a":"Asif"})
Out[24]:
| Name | Age | Occupation | |
|---|---|---|---|
| Asif | Sam | 19 | Engineer |
| b | John | 25 | Doctor |
| c | Mack | 21 | Accountant |
In [25]:
df1
Out[25]:
| Name | Age | Occupation | |
|---|---|---|---|
| a | Sam | 19 | Engineer |
| b | John | 25 | Doctor |
| c | Mack | 21 | Accountant |
We can see the difference
Reading Files¶
Reading CSV files
you can see the available parameter for the read_csv by executing below line
In [26]:
pandas.read_csv?
df stands for dataframe, the exported data of any type of file called as dataframe in pandas
In [27]:
df1 = pandas.read_csv("supermarkets.csv")
df1
Out[27]:
| ID | Address | City | State | Country | Name | Employees | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St | San Francisco | CA 94114 | USA | Madeira | 8 |
| 1 | 2 | 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 2 | 3 | 332 Hill St | San Francisco | California 94114 | USA | Super River | 25 |
| 3 | 4 | 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 4 | 5 | 1056 Sanchez St | San Francisco | California | USA | Sanchez | 12 |
| 5 | 6 | 551 Alvarado St | San Francisco | CA 94114 | USA | Richvalley | 20 |
so as we can see exporting data to dataframe is quite simple using pandas just pass the file name if the file is in same directory or pass the file path in the method, now let read json files
you can see the available parameter for the read_json by pandas.read_json?
In [28]:
df2 = pandas.read_json("supermarkets.json")
df2
Out[28]:
| ID | Address | City | State | Country | Name | Employees | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St | San Francisco | CA 94114 | USA | Madeira | 8 |
| 1 | 2 | 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 2 | 3 | 332 Hill St | San Francisco | California 94114 | USA | Super River | 25 |
| 3 | 4 | 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 4 | 5 | 1056 Sanchez St | San Francisco | California | USA | Sanchez | 12 |
| 5 | 6 | 551 Alvarado St | San Francisco | CA 94114 | USA | Richvalley | 20 |
similarly reading excel files.
but as we know there may be multiple sheets in a excel file so we have to pass sheet_name which is index number of the sheet also you can see the available parameter for the read_excel by pandas.read_excel?
In [29]:
df3 = pandas.read_excel("supermarkets.xlsx",sheet_name=0)
df3
Out[29]:
| ID | Address | City | State | Country | Supermarket Name | Number of Employees | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St | San Francisco | CA 94114 | USA | Madeira | 8 |
| 1 | 2 | 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 2 | 3 | 332 Hill St | San Francisco | California 94114 | USA | Super River | 25 |
| 3 | 4 | 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 4 | 5 | 1056 Sanchez St | San Francisco | California | USA | Sanchez | 12 |
| 5 | 6 | 551 Alvarado St | San Francisco | CA 94114 | USA | Richvalley | 20 |
As pandas is not have any perticular method to read text file so it use csv file to read text file. Actually CSV stands for character seperated value and as we know "supermarkets-commas.txt" is holds comma separated value so it make more sense to use csv for this type of file
In [30]:
df4 = pandas.read_csv("supermarkets-commas.txt")
df4
Out[30]:
| ID | Address | City | State | Country | Name | Employees | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St | San Francisco | CA 94114 | USA | Madeira | 8 |
| 1 | 2 | 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 2 | 3 | 332 Hill St | San Francisco | California 94114 | USA | Super River | 25 |
| 3 | 4 | 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 4 | 5 | 1056 Sanchez St | San Francisco | California | USA | Sanchez | 12 |
| 5 | 6 | 551 Alvarado St | San Francisco | CA 94114 | USA | Richvalley | 20 |
We have another text file with semi-colons seprated values so lets read this file
In [31]:
df5 = pandas.read_csv("supermarkets-semi-colons.txt")
df5
Out[31]:
| ID;Address;City;State;Country;Name;Employees | |
|---|---|
| 0 | 1;3666 21st St;San Francisco;CA 94114;USA;Made... |
| 1 | 2;735 Dolores St;San Francisco;CA 94119;USA;Br... |
| 2 | 3;332 Hill St;San Francisco;California 94114;U... |
| 3 | 4;3995 23rd St;San Francisco;CA 94114;USA;Ben'... |
| 4 | 5;1056 Sanchez St;San Francisco;California;USA... |
| 5 | 6;551 Alvarado St;San Francisco;CA 94114;USA;R... |
The output is not satisfying pandas is able to recognize the seprator of the data becaouse we dont pass the parameter which tell pandas to seprate value based on semi-colon, but in case comma we dont have to pass sep parameter becouse sep have default character as comma so when we have txt file with other character seprated value like semi-colons we have to pass the sep parameter.
In [32]:
df6 = pandas.read_csv("supermarkets-semi-colons.txt", sep=";")
df6
Out[32]:
| ID | Address | City | State | Country | Name | Employees | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St | San Francisco | CA 94114 | USA | Madeira | 8 |
| 1 | 2 | 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 2 | 3 | 332 Hill St | San Francisco | California 94114 | USA | Super River | 25 |
| 3 | 4 | 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 4 | 5 | 1056 Sanchez St | San Francisco | California | USA | Sanchez | 12 |
| 5 | 6 | 551 Alvarado St | San Francisco | CA 94114 | USA | Richvalley | 20 |
As we can see pandas fetch the first line of our files as Column labels but pandas adds an row called row labels automatically, What if we have an file in which we don't uses first line as header such "supermarkets-commas_noheader.txt", so lets try to read this file and see what i mean
In [33]:
df7 = pandas.read_csv("supermarkets-commas_noheader.txt")
df7
Out[33]:
| 1 | 3666 21st St | San Francisco | CA 94114 | USA | Madeira | 8 | |
|---|---|---|---|---|---|---|---|
| 0 | 2 | 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 1 | 3 | 332 Hill St | San Francisco | California 94114 | USA | Super River | 25 |
| 2 | 4 | 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 3 | 5 | 1056 Sanchez St | San Francisco | California | USA | Sanchez | 12 |
| 4 | 6 | 551 Alvarado St | San Francisco | CA 94114 | USA | Richvalley | 20 |
As we can see pandas treate first line of file as header, in such case we can use parameter header such as
In [34]:
df8 = pandas.read_csv("supermarkets-commas_noheader.txt", header = None)
df8
Out[34]:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St | San Francisco | CA 94114 | USA | Madeira | 8 |
| 1 | 2 | 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 2 | 3 | 332 Hill St | San Francisco | California 94114 | USA | Super River | 25 |
| 3 | 4 | 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 4 | 5 | 1056 Sanchez St | San Francisco | California | USA | Sanchez | 12 |
| 5 | 6 | 551 Alvarado St | San Francisco | CA 94114 | USA | Richvalley | 20 |
So if we specify the header None pandas set the header based on numbers but we want real character as header which define data in the column. For this we have to pass the column name respectively in order such as
In [35]:
df8.columns = ["ID","Address ","City","Zip","Country","Name","Employee"]
df8
Out[35]:
| ID | Address | City | Zip | Country | Name | Employee | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St | San Francisco | CA 94114 | USA | Madeira | 8 |
| 1 | 2 | 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 2 | 3 | 332 Hill St | San Francisco | California 94114 | USA | Super River | 25 |
| 3 | 4 | 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 4 | 5 | 1056 Sanchez St | San Francisco | California | USA | Sanchez | 12 |
| 5 | 6 | 551 Alvarado St | San Francisco | CA 94114 | USA | Richvalley | 20 |
INDEXING¶
Now we are going to learn indexing of the Row in Pandas now lets use above df9 file for that.
let me inroduce you a method called set_index()
syntex: df9.set_index(
keys,
drop: 'bool' = True,
append: 'bool' = False,
inplace: 'bool' = False,
verify_integrity: 'bool' = False,)
In [36]:
df8.set_index("ID")
Out[36]:
| Address | City | Zip | Country | Name | Employee | |
|---|---|---|---|---|---|---|
| ID | ||||||
| 1 | 3666 21st St | San Francisco | CA 94114 | USA | Madeira | 8 |
| 2 | 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 3 | 332 Hill St | San Francisco | California 94114 | USA | Super River | 25 |
| 4 | 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 5 | 1056 Sanchez St | San Francisco | California | USA | Sanchez | 12 |
| 6 | 551 Alvarado St | San Francisco | CA 94114 | USA | Richvalley | 20 |
As we can see now the default index column is removed and column ID is set as a index column but this method did not modify the existing data frame means df8 if exicute df8 then we can see the dataframe is as it is
In [37]:
df8
Out[37]:
| ID | Address | City | Zip | Country | Name | Employee | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St | San Francisco | CA 94114 | USA | Madeira | 8 |
| 1 | 2 | 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 2 | 3 | 332 Hill St | San Francisco | California 94114 | USA | Super River | 25 |
| 3 | 4 | 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 4 | 5 | 1056 Sanchez St | San Francisco | California | USA | Sanchez | 12 |
| 5 | 6 | 551 Alvarado St | San Francisco | CA 94114 | USA | Richvalley | 20 |
To store this changes we can use python variables or can reasign the working variable such as
In [38]:
df9 = df8.set_index("ID")
df9
Out[38]:
| Address | City | Zip | Country | Name | Employee | |
|---|---|---|---|---|---|---|
| ID | ||||||
| 1 | 3666 21st St | San Francisco | CA 94114 | USA | Madeira | 8 |
| 2 | 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 3 | 332 Hill St | San Francisco | California 94114 | USA | Super River | 25 |
| 4 | 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 5 | 1056 Sanchez St | San Francisco | California | USA | Sanchez | 12 |
| 6 | 551 Alvarado St | San Francisco | CA 94114 | USA | Richvalley | 20 |
In [39]:
df9 = df9.set_index("City")
df9
Out[39]:
| Address | Zip | Country | Name | Employee | |
|---|---|---|---|---|---|
| City | |||||
| San Francisco | 3666 21st St | CA 94114 | USA | Madeira | 8 |
| San Francisco | 735 Dolores St | CA 94119 | USA | Bready Shop | 15 |
| San Francisco | 332 Hill St | California 94114 | USA | Super River | 25 |
| San Francisco | 3995 23rd St | CA 94114 | USA | Ben's Shop | 10 |
| San Francisco | 1056 Sanchez St | California | USA | Sanchez | 12 |
| San Francisco | 551 Alvarado St | CA 94114 | USA | Richvalley | 20 |
But pandas is also do that for us, if we use the set_index method with the inplace parameter then it will change the df9 it self such as
In [40]:
df9.set_index("Zip", inplace = True)
df9
Out[40]:
| Address | Country | Name | Employee | |
|---|---|---|---|---|
| Zip | ||||
| CA 94114 | 3666 21st St | USA | Madeira | 8 |
| CA 94119 | 735 Dolores St | USA | Bready Shop | 15 |
| California 94114 | 332 Hill St | USA | Super River | 25 |
| CA 94114 | 3995 23rd St | USA | Ben's Shop | 10 |
| California | 1056 Sanchez St | USA | Sanchez | 12 |
| CA 94114 | 551 Alvarado St | USA | Richvalley | 20 |
But as we can see the City column is disappeared, which means the column which is index column before is disappeared so there is another parameter we can use to keep the previous index column too
In [41]:
df9.set_index("Name", inplace=True, drop=False)
df9
Out[41]:
| Address | Country | Name | Employee | |
|---|---|---|---|---|
| Name | ||||
| Madeira | 3666 21st St | USA | Madeira | 8 |
| Bready Shop | 735 Dolores St | USA | Bready Shop | 15 |
| Super River | 332 Hill St | USA | Super River | 25 |
| Ben's Shop | 3995 23rd St | USA | Ben's Shop | 10 |
| Sanchez | 1056 Sanchez St | USA | Sanchez | 12 |
| Richvalley | 551 Alvarado St | USA | Richvalley | 20 |
MANIPULATION OF DATAFRAME¶
Two type
1.Label Based Indexing
2.Position Based Indexing
What is labels, "The column and Row name in Dataframe is called labells such as Address, Country etc are Column labels and Madeira, Bready Shop are row labels in above table df9"
Lets read the fresh file for this
In [42]:
df9 = pandas.read_csv("supermarkets-commas.txt")
df9.set_index("Address", inplace=True)
df9
Out[42]:
| ID | City | State | Country | Name | Employees | |
|---|---|---|---|---|---|---|
| Address | ||||||
| 3666 21st St | 1 | San Francisco | CA 94114 | USA | Madeira | 8 |
| 735 Dolores St | 2 | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 332 Hill St | 3 | San Francisco | California 94114 | USA | Super River | 25 |
| 3995 23rd St | 4 | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 1056 Sanchez St | 5 | San Francisco | California | USA | Sanchez | 12 |
| 551 Alvarado St | 6 | San Francisco | CA 94114 | USA | Richvalley | 20 |
Now First lets explore
1.Label Based indexing
There is a method called
df.loc["Starting_row_label":"End_row_label", "Starting_column_label":"End_column_label"]
so by this method we can print the different rows, columns, even value, this method also will not update the working dataframe.
In [43]:
df9.loc["735 Dolores St":"1056 Sanchez St","City":"Name"]
Out[43]:
| City | State | Country | Name | |
|---|---|---|---|---|
| Address | ||||
| 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop |
| 332 Hill St | San Francisco | California 94114 | USA | Super River |
| 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop |
| 1056 Sanchez St | San Francisco | California | USA | Sanchez |
Now we can see there are two rows First and Last are ignored and two columns ID and Empoyee are ignored it is like slicing we used in list. Using this we can also extract single value, just pass target row and target column, the intersection of this row and clumn is returned. such as
In [44]:
df9.loc["735 Dolores St","Name"]
Out[44]:
'Bready Shop'
If we want all the entry of a single column then it will return all the column values with the index values(Keep in mind we are using Address as Index column)
In [45]:
df9.loc[:,"Name"]
Out[45]:
Address 3666 21st St Madeira 735 Dolores St Bready Shop 332 Hill St Super River 3995 23rd St Ben's Shop 1056 Sanchez St Sanchez 551 Alvarado St Richvalley Name: Name, dtype: object
we can also convert this output as a list usinf list() fuction
In [46]:
list(df9.loc[:,"Name"])
Out[46]:
[' Madeira', 'Bready Shop', 'Super River', "Ben's Shop", 'Sanchez', 'Richvalley']
This is the basic of extracting data from data frame using label based indexing, now lets explore the more common way to access or extract the data from dataframe that is
2.Position based Indexing
In [47]:
df9
Out[47]:
| ID | City | State | Country | Name | Employees | |
|---|---|---|---|---|---|---|
| Address | ||||||
| 3666 21st St | 1 | San Francisco | CA 94114 | USA | Madeira | 8 |
| 735 Dolores St | 2 | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 332 Hill St | 3 | San Francisco | California 94114 | USA | Super River | 25 |
| 3995 23rd St | 4 | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 1056 Sanchez St | 5 | San Francisco | California | USA | Sanchez | 12 |
| 551 Alvarado St | 6 | San Francisco | CA 94114 | USA | Richvalley | 20 |
This method is same as the Label based indexing, lets see an example and then we can see the diffrence even function name is same just add i before loc ie df.iloc[]
In [48]:
df9.iloc[1:4,1:4]
Out[48]:
| City | State | Country | |
|---|---|---|---|
| Address | |||
| 735 Dolores St | San Francisco | CA 94119 | USA |
| 332 Hill St | San Francisco | California 94114 | USA |
| 3995 23rd St | San Francisco | CA 94114 | USA |
As we can see the column "Name" and row "1056 Sanchez St" is not included in the output, The end index position in iloc method is refer to the end position and it will not include it in the output, So it is the difference between Label based and Position based indexing otherwise we can do all the thing we did with label based indexing
In [49]:
df9.iloc[1:4,1:5]
Out[49]:
| City | State | Country | Name | |
|---|---|---|---|---|
| Address | ||||
| 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop |
| 332 Hill St | San Francisco | California 94114 | USA | Super River |
| 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop |
DETELING ROWS AND COLUMN¶
So now lets see how we can delete rows and columns, The method we used for this is
Syntex:
df9.drop(
labels=None,
axis: 'Axis' = 0,
index=None,
columns=None,
level: 'Level | None' = None,
inplace: 'bool' = False,
errors: 'str' = 'raise'
)
This fuction also not update the working dataframe.
In [50]:
df9
Out[50]:
| ID | City | State | Country | Name | Employees | |
|---|---|---|---|---|---|---|
| Address | ||||||
| 3666 21st St | 1 | San Francisco | CA 94114 | USA | Madeira | 8 |
| 735 Dolores St | 2 | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 332 Hill St | 3 | San Francisco | California 94114 | USA | Super River | 25 |
| 3995 23rd St | 4 | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 1056 Sanchez St | 5 | San Francisco | California | USA | Sanchez | 12 |
| 551 Alvarado St | 6 | San Francisco | CA 94114 | USA | Richvalley | 20 |
Lets delete column "City" from df9
In [51]:
df9.drop("City",1)
C:\Users\Patel\AppData\Local\Temp\ipykernel_14288\4148395828.py:1: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only.
df9.drop("City",1)
Out[51]:
| ID | State | Country | Name | Employees | |
|---|---|---|---|---|---|
| Address | |||||
| 3666 21st St | 1 | CA 94114 | USA | Madeira | 8 |
| 735 Dolores St | 2 | CA 94119 | USA | Bready Shop | 15 |
| 332 Hill St | 3 | California 94114 | USA | Super River | 25 |
| 3995 23rd St | 4 | CA 94114 | USA | Ben's Shop | 10 |
| 1056 Sanchez St | 5 | California | USA | Sanchez | 12 |
| 551 Alvarado St | 6 | CA 94114 | USA | Richvalley | 20 |
Here we have passed two parameter first the column name and second is a int value 1, so this 1 represent column and 0 represent rows and this value called axis, the logic behind this 1 and 0 is drop() method take the column as y exis (ie 1) and rows as x axis (ie 0)
In [52]:
df9
Out[52]:
| ID | City | State | Country | Name | Employees | |
|---|---|---|---|---|---|---|
| Address | ||||||
| 3666 21st St | 1 | San Francisco | CA 94114 | USA | Madeira | 8 |
| 735 Dolores St | 2 | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 332 Hill St | 3 | San Francisco | California 94114 | USA | Super River | 25 |
| 3995 23rd St | 4 | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 1056 Sanchez St | 5 | San Francisco | California | USA | Sanchez | 12 |
| 551 Alvarado St | 6 | San Francisco | CA 94114 | USA | Richvalley | 20 |
As we can see the City column is not deleted from the existing dataframe, there many ways to achive this re-asign the variable is one of them but our drop method also allow us to do that there is parameter called inplace which is False by default we can upate the working dataframe using this parameter
In [53]:
df9.drop("City",1, inplace=True)
df9
C:\Users\Patel\AppData\Local\Temp\ipykernel_14288\2518956333.py:1: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only.
df9.drop("City",1, inplace=True)
Out[53]:
| ID | State | Country | Name | Employees | |
|---|---|---|---|---|---|
| Address | |||||
| 3666 21st St | 1 | CA 94114 | USA | Madeira | 8 |
| 735 Dolores St | 2 | CA 94119 | USA | Bready Shop | 15 |
| 332 Hill St | 3 | California 94114 | USA | Super River | 25 |
| 3995 23rd St | 4 | CA 94114 | USA | Ben's Shop | 10 |
| 1056 Sanchez St | 5 | California | USA | Sanchez | 12 |
| 551 Alvarado St | 6 | CA 94114 | USA | Richvalley | 20 |
Now lets see how we can delete a row, as we have discussed before.
In [54]:
df9.drop("332 Hill St",0)
C:\Users\Patel\AppData\Local\Temp\ipykernel_14288\3688209960.py:1: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only.
df9.drop("332 Hill St",0)
Out[54]:
| ID | State | Country | Name | Employees | |
|---|---|---|---|---|---|
| Address | |||||
| 3666 21st St | 1 | CA 94114 | USA | Madeira | 8 |
| 735 Dolores St | 2 | CA 94119 | USA | Bready Shop | 15 |
| 3995 23rd St | 4 | CA 94114 | USA | Ben's Shop | 10 |
| 1056 Sanchez St | 5 | California | USA | Sanchez | 12 |
| 551 Alvarado St | 6 | CA 94114 | USA | Richvalley | 20 |
All the methods is same we can also use inplace to modify the working df
But what if we want to delete multiple rows and column? the answer is simple we can use position indexing for this
In [55]:
df9.drop(df9.columns[0:2],1)
C:\Users\Patel\AppData\Local\Temp\ipykernel_14288\3033250773.py:1: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only. df9.drop(df9.columns[0:2],1)
Out[55]:
| Country | Name | Employees | |
|---|---|---|---|
| Address | |||
| 3666 21st St | USA | Madeira | 8 |
| 735 Dolores St | USA | Bready Shop | 15 |
| 332 Hill St | USA | Super River | 25 |
| 3995 23rd St | USA | Ben's Shop | 10 |
| 1056 Sanchez St | USA | Sanchez | 12 |
| 551 Alvarado St | USA | Richvalley | 20 |
ID and State column are deleted from the df we can use same method for deleting multiple rows
In [56]:
df9.drop(df9.index[1:3],0)
C:\Users\Patel\AppData\Local\Temp\ipykernel_14288\2734101091.py:1: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only. df9.drop(df9.index[1:3],0)
Out[56]:
| ID | State | Country | Name | Employees | |
|---|---|---|---|---|---|
| Address | |||||
| 3666 21st St | 1 | CA 94114 | USA | Madeira | 8 |
| 3995 23rd St | 4 | CA 94114 | USA | Ben's Shop | 10 |
| 1056 Sanchez St | 5 | California | USA | Sanchez | 12 |
| 551 Alvarado St | 6 | CA 94114 | USA | Richvalley | 20 |
Row 735 Dolores St and 332 Hill St is deleted, Index is start from 0 in python
TIPs: want to see all the columns and rows in a list then
In [57]:
df9.index
Out[57]:
Index(['3666 21st St', '735 Dolores St', '332 Hill St', '3995 23rd St',
'1056 Sanchez St', '551 Alvarado St'],
dtype='object', name='Address')
In [58]:
df9.columns
Out[58]:
Index(['ID', 'State', 'Country', 'Name', 'Employees'], dtype='object')
UPDATING AND ADDING NEW COLUMS AND ROWS¶
First look at the adding new column
In [59]:
df9 = pandas.read_csv("supermarkets-commas.txt")
df9.set_index("Address", inplace=True)
df9
Out[59]:
| ID | City | State | Country | Name | Employees | |
|---|---|---|---|---|---|---|
| Address | ||||||
| 3666 21st St | 1 | San Francisco | CA 94114 | USA | Madeira | 8 |
| 735 Dolores St | 2 | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 332 Hill St | 3 | San Francisco | California 94114 | USA | Super River | 25 |
| 3995 23rd St | 4 | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 1056 Sanchez St | 5 | San Francisco | California | USA | Sanchez | 12 |
| 551 Alvarado St | 6 | San Francisco | CA 94114 | USA | Richvalley | 20 |
TIPS: Let's explore some useful methods
In [60]:
df9.shape
Out[60]:
(6, 6)
This print number of column and Rows count in tuple (Row Count, Column Count)
In [61]:
df9.shape[0]
Out[61]:
6
We can also access this column and row count by this method
There is a method which will convert or we can say Transfer column into row and row into columns the syntax is very simple just dt.T (Means Transfer)
In [62]:
df9_new = df9.T
df9_new
Out[62]:
| Address | 3666 21st St | 735 Dolores St | 332 Hill St | 3995 23rd St | 1056 Sanchez St | 551 Alvarado St |
|---|---|---|---|---|---|---|
| ID | 1 | 2 | 3 | 4 | 5 | 6 |
| City | San Francisco | San Francisco | San Francisco | San Francisco | San Francisco | San Francisco |
| State | CA 94114 | CA 94119 | California 94114 | CA 94114 | California | CA 94114 |
| Country | USA | USA | USA | USA | USA | USA |
| Name | Madeira | Bready Shop | Super River | Ben's Shop | Sanchez | Richvalley |
| Employees | 8 | 15 | 25 | 10 | 12 | 20 |
Here we go we can see we have successfully transfer the Row into Column
Now back to topic Lets consider we want to add a column as Continent we can directly add an column in our Dataframe using
df9['Column name']=[Values that will match the number of rows we have]
in df9 case we have 6 rows so we have to pass 6 values otherwise we will get an error
In [63]:
df9["Continent"]=["North America","North America","North America","North America","North America","North America"]
df9
Out[63]:
| ID | City | State | Country | Name | Employees | Continent | |
|---|---|---|---|---|---|---|---|
| Address | |||||||
| 3666 21st St | 1 | San Francisco | CA 94114 | USA | Madeira | 8 | North America |
| 735 Dolores St | 2 | San Francisco | CA 94119 | USA | Bready Shop | 15 | North America |
| 332 Hill St | 3 | San Francisco | California 94114 | USA | Super River | 25 | North America |
| 3995 23rd St | 4 | San Francisco | CA 94114 | USA | Ben's Shop | 10 | North America |
| 1056 Sanchez St | 5 | San Francisco | California | USA | Sanchez | 12 | North America |
| 551 Alvarado St | 6 | San Francisco | CA 94114 | USA | Richvalley | 20 | North America |
As we can see we have successfully added a new column, if we want to add same values for all the rows then this process is very lengthy so we can use a trick for this lets delete the column Continent and then again add it using the trick
In [64]:
df9=df9.drop("Continent",1)
C:\Users\Patel\AppData\Local\Temp\ipykernel_14288\3688399635.py:1: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only.
df9=df9.drop("Continent",1)
For this we used shape method to multiply all the rows for the incoming row values, its basically an simple multiplication
In [65]:
df9["Continent"]=df9.shape[0]*["North America"]
df9
Out[65]:
| ID | City | State | Country | Name | Employees | Continent | |
|---|---|---|---|---|---|---|---|
| Address | |||||||
| 3666 21st St | 1 | San Francisco | CA 94114 | USA | Madeira | 8 | North America |
| 735 Dolores St | 2 | San Francisco | CA 94119 | USA | Bready Shop | 15 | North America |
| 332 Hill St | 3 | San Francisco | California 94114 | USA | Super River | 25 | North America |
| 3995 23rd St | 4 | San Francisco | CA 94114 | USA | Ben's Shop | 10 | North America |
| 1056 Sanchez St | 5 | San Francisco | California | USA | Sanchez | 12 | North America |
| 551 Alvarado St | 6 | San Francisco | CA 94114 | USA | Richvalley | 20 | North America |
Now lets see how we can update existing column values, as we understood by using df9['Column name'] we can access all the rows in this column lets update the newly added Continent column, in a way that it will hold the value of county column with the existing value
In [66]:
df9["Continent"]=df9["Country"]+", "+df9["Continent"]
df9
Out[66]:
| ID | City | State | Country | Name | Employees | Continent | |
|---|---|---|---|---|---|---|---|
| Address | |||||||
| 3666 21st St | 1 | San Francisco | CA 94114 | USA | Madeira | 8 | USA, North America |
| 735 Dolores St | 2 | San Francisco | CA 94119 | USA | Bready Shop | 15 | USA, North America |
| 332 Hill St | 3 | San Francisco | California 94114 | USA | Super River | 25 | USA, North America |
| 3995 23rd St | 4 | San Francisco | CA 94114 | USA | Ben's Shop | 10 | USA, North America |
| 1056 Sanchez St | 5 | San Francisco | California | USA | Sanchez | 12 | USA, North America |
| 551 Alvarado St | 6 | San Francisco | CA 94114 | USA | Richvalley | 20 | USA, North America |
If you motic the method for both adding new and updating existing column is same
df["Column_Name"]=[array of values]
so we can say pandas look for the column name we are passing in dataframe if it exist then it will update the existing and if it not exist then it will add new
Note:- So be careful while passing column Name
Adding or Updating rows are title bit tricky we have to transfer the rows into column and then add and update the rows as we did columns sofar,
let's first transfer the rows into columns using very simple df.T method
In [67]:
df9_t = df9.T
df9_t
Out[67]:
| Address | 3666 21st St | 735 Dolores St | 332 Hill St | 3995 23rd St | 1056 Sanchez St | 551 Alvarado St |
|---|---|---|---|---|---|---|
| ID | 1 | 2 | 3 | 4 | 5 | 6 |
| City | San Francisco | San Francisco | San Francisco | San Francisco | San Francisco | San Francisco |
| State | CA 94114 | CA 94119 | California 94114 | CA 94114 | California | CA 94114 |
| Country | USA | USA | USA | USA | USA | USA |
| Name | Madeira | Bready Shop | Super River | Ben's Shop | Sanchez | Richvalley |
| Employees | 8 | 15 | 25 | 10 | 12 | 20 |
| Continent | USA, North America | USA, North America | USA, North America | USA, North America | USA, North America | USA, North America |
Okay so we successfully trasfer the row into columns, Now lets add new columns
In [68]:
df9_t["My Address"]=[7,"My City","My State","My Country","My Name",15,"My Continent"]
df9_t
Out[68]:
| Address | 3666 21st St | 735 Dolores St | 332 Hill St | 3995 23rd St | 1056 Sanchez St | 551 Alvarado St | My Address |
|---|---|---|---|---|---|---|---|
| ID | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| City | San Francisco | San Francisco | San Francisco | San Francisco | San Francisco | San Francisco | My City |
| State | CA 94114 | CA 94119 | California 94114 | CA 94114 | California | CA 94114 | My State |
| Country | USA | USA | USA | USA | USA | USA | My Country |
| Name | Madeira | Bready Shop | Super River | Ben's Shop | Sanchez | Richvalley | My Name |
| Employees | 8 | 15 | 25 | 10 | 12 | 20 | 15 |
| Continent | USA, North America | USA, North America | USA, North America | USA, North America | USA, North America | USA, North America | My Continent |
Now we have successfully added a new column in the table df9_t it's time to transfer it to it's original shape, so we are updating our df9 dataframe using this new dataframe df9_t
In [69]:
df9 = df9_t.T
df9
Out[69]:
| ID | City | State | Country | Name | Employees | Continent | |
|---|---|---|---|---|---|---|---|
| Address | |||||||
| 3666 21st St | 1 | San Francisco | CA 94114 | USA | Madeira | 8 | USA, North America |
| 735 Dolores St | 2 | San Francisco | CA 94119 | USA | Bready Shop | 15 | USA, North America |
| 332 Hill St | 3 | San Francisco | California 94114 | USA | Super River | 25 | USA, North America |
| 3995 23rd St | 4 | San Francisco | CA 94114 | USA | Ben's Shop | 10 | USA, North America |
| 1056 Sanchez St | 5 | San Francisco | California | USA | Sanchez | 12 | USA, North America |
| 551 Alvarado St | 6 | San Francisco | CA 94114 | USA | Richvalley | 20 | USA, North America |
| My Address | 7 | My City | My State | My Country | My Name | 15 | My Continent |
CONVERT ADDRESS INTO COORDINATES¶
For this lets extract new df from the csv file
In [70]:
df=pandas.read_csv("supermarkets.csv")
df
Out[70]:
| ID | Address | City | State | Country | Name | Employees | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St | San Francisco | CA 94114 | USA | Madeira | 8 |
| 1 | 2 | 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 2 | 3 | 332 Hill St | San Francisco | California 94114 | USA | Super River | 25 |
| 3 | 4 | 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 4 | 5 | 1056 Sanchez St | San Francisco | California | USA | Sanchez | 12 |
| 5 | 6 | 551 Alvarado St | San Francisco | CA 94114 | USA | Richvalley | 20 |
firt let get throught the Geopy package So the geopy is the package which is convert the string type of address into a longitude and latitude, it send the string type of address on to server and figure out the langitude and latitude, so lets play a bit to understand
import geopy
dir(geopy)
using this we can see all the module in it, here we only need ArcGIS method from geocoder so lets import it first
In [71]:
from geopy.geocoders import ArcGIS
loc = ArcGIS() #Creating instance
Now if we pass the address here then we will get full address with langitude and Latitude that we need
In [72]:
l=loc.geocode("3995 23rd St, San Francisc, CA 94114")
l
Out[72]:
Location(3995 23rd St, San Francisco, California, 94114, (37.75292099814533, -122.43170295955227, 0.0))
So in the last bracket we have Longitude and Latitude. We can also extract the longitude and latitude from this location object
In [73]:
l.longitude
Out[73]:
-122.43170295955227
In [74]:
l.latitude
Out[74]:
37.75292099814533
The l variable is not the string type of object it is a special type of object from geopy
In [75]:
type(l)
Out[75]:
geopy.location.Location
In [76]:
df
Out[76]:
| ID | Address | City | State | Country | Name | Employees | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St | San Francisco | CA 94114 | USA | Madeira | 8 |
| 1 | 2 | 735 Dolores St | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 2 | 3 | 332 Hill St | San Francisco | California 94114 | USA | Super River | 25 |
| 3 | 4 | 3995 23rd St | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 4 | 5 | 1056 Sanchez St | San Francisco | California | USA | Sanchez | 12 |
| 5 | 6 | 551 Alvarado St | San Francisco | CA 94114 | USA | Richvalley | 20 |
So as we have seen geopy accept the address in string format and we have data into a csv file in different columns so first lets Concatenate all the required data into existing Address column. So the geocoder accept address in the format address, city, state, country
In [77]:
df["Address"]=df["Address"]+", "+df["City"]+", "+df["State"]+", "+df["Country"]
df
Out[77]:
| ID | Address | City | State | Country | Name | Employees | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St, San Francisco, CA 94114, USA | San Francisco | CA 94114 | USA | Madeira | 8 |
| 1 | 2 | 735 Dolores St, San Francisco, CA 94119, USA | San Francisco | CA 94119 | USA | Bready Shop | 15 |
| 2 | 3 | 332 Hill St, San Francisco, California 94114, USA | San Francisco | California 94114 | USA | Super River | 25 |
| 3 | 4 | 3995 23rd St, San Francisco, CA 94114, USA | San Francisco | CA 94114 | USA | Ben's Shop | 10 |
| 4 | 5 | 1056 Sanchez St, San Francisco, California, USA | San Francisco | California | USA | Sanchez | 12 |
| 5 | 6 | 551 Alvarado St, San Francisco, CA 94114, USA | San Francisco | CA 94114 | USA | Richvalley | 20 |
Alright so now we have the full address in Address column so we can use this column into geocoder so we can get coordinates, you probably thinging of for loop to get every address coordinates, but not pandas has a method called apply() it will do the looping for us, lets take a look how it works so we have created a loc as ArcGIS instance, lets create new column as coordinates using .apply() method
In [78]:
df["Coordinates"]=df["Address"].apply(loc.geocode)
df
Out[78]:
| ID | Address | City | State | Country | Name | Employees | Coordinates | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St, San Francisco, CA 94114, USA | San Francisco | CA 94114 | USA | Madeira | 8 | (3666 21st St, San Francisco, California, 9411... |
| 1 | 2 | 735 Dolores St, San Francisco, CA 94119, USA | San Francisco | CA 94119 | USA | Bready Shop | 15 | (735 Dolores St, San Francisco, California, 94... |
| 2 | 3 | 332 Hill St, San Francisco, California 94114, USA | San Francisco | California 94114 | USA | Super River | 25 | (332 Hill St, San Francisco, California, 94114... |
| 3 | 4 | 3995 23rd St, San Francisco, CA 94114, USA | San Francisco | CA 94114 | USA | Ben's Shop | 10 | (3995 23rd St, San Francisco, California, 9411... |
| 4 | 5 | 1056 Sanchez St, San Francisco, California, USA | San Francisco | California | USA | Sanchez | 12 | (1056 Sanchez St, San Francisco, California, 9... |
| 5 | 6 | 551 Alvarado St, San Francisco, CA 94114, USA | San Francisco | CA 94114 | USA | Richvalley | 20 | (551 Alvarado St, San Francisco, California, 9... |
So in the column Coordinates we can not see longitude and latitude so lets just create another column for these two values
In [79]:
df["Longitudes"]=df["Coordinates"].apply(lambda x: x.longitude if x != None else None)
df
Out[79]:
| ID | Address | City | State | Country | Name | Employees | Coordinates | Longitudes | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St, San Francisco, CA 94114, USA | San Francisco | CA 94114 | USA | Madeira | 8 | (3666 21st St, San Francisco, California, 9411... | -122.429411 |
| 1 | 2 | 735 Dolores St, San Francisco, CA 94119, USA | San Francisco | CA 94119 | USA | Bready Shop | 15 | (735 Dolores St, San Francisco, California, 94... | -122.425373 |
| 2 | 3 | 332 Hill St, San Francisco, California 94114, USA | San Francisco | California 94114 | USA | Super River | 25 | (332 Hill St, San Francisco, California, 94114... | -122.428813 |
| 3 | 4 | 3995 23rd St, San Francisco, CA 94114, USA | San Francisco | CA 94114 | USA | Ben's Shop | 10 | (3995 23rd St, San Francisco, California, 9411... | -122.431703 |
| 4 | 5 | 1056 Sanchez St, San Francisco, California, USA | San Francisco | California | USA | Sanchez | 12 | (1056 Sanchez St, San Francisco, California, 9... | -122.430006 |
| 5 | 6 | 551 Alvarado St, San Francisco, CA 94114, USA | San Francisco | CA 94114 | USA | Richvalley | 20 | (551 Alvarado St, San Francisco, California, 9... | -122.433234 |
So you probably did not understand the above lambda function. we used lambda becouse we want to store location object in a variable and then we can use .longitude method, we can also do the same by user define function it will help you understand better
In [80]:
def latitude(coordinates):
if coordinates:
return coordinates.latitude
else:
return None
df["Latitude"]=df["Coordinates"].apply(latitude)
df
Out[80]:
| ID | Address | City | State | Country | Name | Employees | Coordinates | Longitudes | Latitude | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 3666 21st St, San Francisco, CA 94114, USA | San Francisco | CA 94114 | USA | Madeira | 8 | (3666 21st St, San Francisco, California, 9411... | -122.429411 | 37.756632 |
| 1 | 2 | 735 Dolores St, San Francisco, CA 94119, USA | San Francisco | CA 94119 | USA | Bready Shop | 15 | (735 Dolores St, San Francisco, California, 94... | -122.425373 | 37.757829 |
| 2 | 3 | 332 Hill St, San Francisco, California 94114, USA | San Francisco | California 94114 | USA | Super River | 25 | (332 Hill St, San Francisco, California, 94114... | -122.428813 | 37.755845 |
| 3 | 4 | 3995 23rd St, San Francisco, CA 94114, USA | San Francisco | CA 94114 | USA | Ben's Shop | 10 | (3995 23rd St, San Francisco, California, 9411... | -122.431703 | 37.752921 |
| 4 | 5 | 1056 Sanchez St, San Francisco, California, USA | San Francisco | California | USA | Sanchez | 12 | (1056 Sanchez St, San Francisco, California, 9... | -122.430006 | 37.752132 |
| 5 | 6 | 551 Alvarado St, San Francisco, CA 94114, USA | San Francisco | CA 94114 | USA | Richvalley | 20 | (551 Alvarado St, San Francisco, California, 9... | -122.433234 | 37.753582 |
Section END¶
Blog By Patel Aasif Khan
Comments
Post a Comment