mercredi 29 juin 2016

Convert datatime to milliseconds since midnight UTC or localized in CSV file using Pandas

import pandas as pd
import numpy as np
from datetime import datetime, time


# history file and batch size for processing.

historyFilePath = 'EURUSD.SAMPLE.csv'
batch_size = 5000


# function for date parsing
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')


# load data into a pandas iterator with all the chunks
ratesFromCSVChunks = pd.read_csv(historyFilePath, index_col=0, engine='python', parse_dates=True,
                                 date_parser=dateparse, header=None,
                                 names=["datetime", "1_Current", "2_BidPx", "3_BidSz", "4_AskPx", "5_AskSz"],
                                 iterator=True,
                                 chunksize=batch_size)



# concatenate chunks to get the final array
ratesFromCSV = pd.concat([chunk for chunk in ratesFromCSVChunks])

# save final csv file
df.to_csv('EURUSD_processed.csv', date_format='%Y-%m-%d %H:%M:%S.%f',
             columns=['1_Current', '2_BidPx', '3_BidSz', '4_AskPx', '5_AskSz'], header=False, float_format='%.5f')

I am reading a CSV file containing forex data in the format

    2014-08-17 17:00:01.000000,1.33910,1.33910,1.00000,1.33930,1.00000
    2014-08-17 17:00:01.000000,1.33910,1.33910,1.00000,1.33950,1.00000
    2014-08-17 17:00:02.000000,1.33910,1.33910,1.00000,1.33930,1.00000
    2014-08-17 17:00:02.000000,1.33900,1.33900,1.00000,1.33940,1.00000
    2014-08-17 17:00:04.000000,1.33910,1.33910,1.00000,1.33950,1.00000
    2014-08-17 17:00:05.000000,1.33930,1.33930,1.00000,1.33950,1.00000
    2014-08-17 17:00:06.000000,1.33920,1.33920,1.00000,1.33960,1.00000
    2014-08-17 17:00:06.000000,1.33910,1.33910,1.00000,1.33950,1.00000
    2014-08-17 17:00:08.000000,1.33900,1.33900,1.00000,1.33942,1.00000
    2014-08-17 17:00:16.000000,1.33900,1.33900,1.00000,1.33940,1.00000

How do you convert from Datatime in the CSV file or pandas dataframe being read to EPOCH time in MILLISECONDS from MIDNIGHT ( UTC or localized ) by the time it is being saved. Each file Starts at Midnight every day . The only thing being changed is the format of datetime to miilliseconds from midnight every day( UTC or localized) . The format i am looking for is:

    43264234, 1.33910,1.33910,1.00000,1.33930,1.00000
    43264739, 1.33910,1.33910,1.00000,1.33950,1.00000
    43265282, 1.33910,1.33910,1.00000,1.33930,1.00000
    43265789, 1.33900,1.33900,1.00000,1.33940,1.00000
    43266318, 1.33910,1.33910,1.00000,1.33950,1.00000
    43266846, 1.33930,1.33930,1.00000,1.33950,1.00000
    43267353, 1.33920,1.33920,1.00000,1.33960,1.00000
    43267872, 1.33910,1.33910,1.00000,1.33950,1.00000
    43268387, 1.33900,1.33900,1.00000,1.33942,1.00000

Any help is well appreciated ( short & precise in Python 3.5 or Python 3.4 and above with Pandas 0.18.1 and numpy 1.11 )

Aucun commentaire:

Enregistrer un commentaire