Open CSV File

Information, tips and instructions

Read CSV File using Pandas on Python

CSV is a file format widely used to store and transfer tabular data in text format. CSV data can include both textual and numeric fields. Values in CSV files are separated by commas.

Pandas (short from "panel data") is a powerful data manipulation and processing library written for the Python programming language. It provides multiple advanced data manipulation and processing features including:

  • Reading and writing data in different file formats including CSV
  • Pivoting and transforming data sets
  • Group by and split-apply-combine operatins
  • Merging and joining of data
  • Processing of higher-dimensional data
  • Time series processing
  • Data filtration

In this short article we will only examine how to read data into Pandas from CSV file format. It is important to know that data in Pandas is stored in DataFrame which is a two dimension labeled data structure with columns which may have different data types. So if you can read your data into the DataFrame then part of your job is already done. All that is remaining is to process the data. Let's look at an example of how to read the following CSV file into the DataFrame.

Name,Age,Gender
John,30,Male
Melissa,25,Female
Alan,42,Male
Chelsey,40,Female

To read this CSV file into Pandas dataframe we will use following Python code:

import pandas as pd

df = pd.read_csv('csvfile.csv')

print(df)

The first line in the code above will import Python pandas module under "pd" alias. The second line will read the data from the CSV into "df" data frame. Notice the simplicity of this line. And the final line will print the data frame in an easy to read format. The output is below:

Name Age Gender
0 John 30 Male
1 Melissa 25 Female
2 Alan 42 Male
3 Chelsey 40 Female

Now let's quickly look at how powerful Pandas is. Suppose we want to select Age and Gender columns from the dataset above. For this we will just need to add one extra line to the python code as shown below:

import pandas as pd

data = pd.read_csv('csvfile.csv')

df = pd.DataFrame(data, columns=['Age','Gender'])

print(df)

As you can see we added the call to DataFrame function with parameter columns where we specified the fields which we want to select to the resulting "df" dataframe. The output of the code is shown below:

Age Gender
0 30 Male
1 25 Female
2 42 Male
3 40 Female