Open CSV File

Information, tips and instructions

Write CSV File using Pandas on Python

CSV is a file format widely used to store and transfer tabular data in text format. CSV data can include both textual and numeric fields. Values in CSV files are separated by commas.

Pandas (short from "panel data") is a powerful data manipulation and processing library written for the Python programming language. It provides multiple advanced data manipulation and processing features including:

  • Reading and writing data in different file formats including CSV
  • Pivoting and transforming data sets
  • Group by and split-apply-combine operatins
  • Merging and joining of data
  • Processing of higher-dimensional data
  • Time series processing
  • Data filtration

In this short article we will only examine how to write data from Pandas into the CSV file format. It is important to know that data in Pandas is stored in DataFrame which is a two dimension labeled data structure with columns which may have different data types. So once you process the data and have the resulting DataFrame, you can write it to CSV easily. Let's look at an example of how to write the DataFrame to CSV.

First we will assume that we have the file below loaded into the DataFrame.

Name,Age,Gender
John,30,Male
Melissa,25,Female
Alan,42,Male
Chelsey,40,Female

To read this CSV file into Pandas dataframe we will use following Python code:

import pandas as pd

df = pd.read_csv('csvfile.csv')

print(df)

The first line in the code above will import Python pandas module under "pd" alias. The second line will read the data from the CSV into "df" data frame. Notice the simplicity of this line. And the final line will print the data frame in an easy to read format. The output is below:

Name Age Gender
0 John 30 Male
1 Melissa 25 Female
2 Alan 42 Male
3 Chelsey 40 Female

Now let's add a line to do a quick transformation on the CSV file we read and put the resulting adata into a different DataFrame. import pandas as pd

data = pd.read_csv('csvfile.csv')

df = pd.DataFrame(data, columns=['Age','Gender'])

print(df)

As you can see we added the call to DataFrame function with parameter columns where we specified the fields which we want to select to the resulting "df" dataframe. The output of the code is shown below:

Age Gender
0 30 Male
1 25 Female
2 42 Male
3 40 Female

Now to the final step. Let's write the "df" frame contents to the CSV. It is done by just adding a call to "to_csv"function as shown below.

import pandas as pd

data = pd.read_csv('csvfile.csv')

df = pd.DataFrame(data, columns=['Age','Gender'])

print(df)

df.to_csv('out.csv', index=False)

As you can see below the to_csv will write the data to the "out.csv" file. Index=false parameter is used to tell Pandas not to write the numbers of rows into the CSV. The resulting CSV file is shown below:

Age,Gender
30,Male
25,Female
42,Male
40,Female

As you can see writing out CSV from Pandas is very easy. For full reference of to_csv command refer to Python to_csv manual.