lundi 13 juin 2016

Writing netcdf time in python when transferring from ascii to netcdf format

I am writing a script in python that reads in data from an ascii file, which was generated by a surge model, and puts the data into a netCDF file. However I am having problems as soon as I add the time to the file.

The ascii file has the following structure:

-6.000 1.860 1.774 0.064
-5.833 1.900 1.829 0.054
-5.667 1.930 1.868 0.057
-5.500 1.940 1.890 0.070
...

The first column contains the hours in relation to the date of file name. The other columns contain different output variables. I was able to successfully write a script including the right time attributes for similar ascii files (same structure, generated by the same model) but can't seem to be able to apply the same principle to the new ascii files.

The main part of the script reads as follows:

# Imports
import numpy as np
from netCDF4 import Dataset
import os
from subprocess import call
from datetime import datetime, timedelta
from netCDF4 import num2date, date2num

# definition of start date and time
# (in the original code this is defined by reading the file 
# name as string and cutting it according to the date/time 
# convention of the name)
fyr = '2003'
fmon = '01'
fday = '01'
ftime = '12'

source = '/usrs/.../'            #${path_to_ascii_file}
filename = 'xyz_200311231800'    #name of ascii file

# open netCDF
nc = Dataset("file.nc",'w', format="NETCDF4")

nc.description = 'File containing data read from ascii file'
import time
nc.history     = 'Created ' + time.ctime(time.time())
nc.source      = 'path to ascii source file'

# define time dimension and variables
time = nc.createDimension('time', None)

hoursout           = nc.createVariable('hours','f4',('time',),zlib=True,least_significant_digit=3)
times              = nc.createVariable('time', 'f4',('time',),zlib=True)
hoursout.units     = 'hours since 2003-01-13 12:00:00.0'
times.units        = 'hours since 0001-01-01 00:00:00.0'
times.calendar     = 'gregorian'

# reading data from ascii files in directories within root directory 
# that contain data from different measurement stations 
# (station name = directory name)
for root, dirs, filenames in os.walk(source):
    for d in dirs:
        filepath = os.path.join(source, str(d), filename)
        data     = []
        data2    = []

        # read ascii file
        with open(filepath, 'r') as f2:
            lines = f2.readlines()
        data = [line.split() for line in lines] 
        data2 = np.asfarray(data)   # -> time = data2[:,0]; other vars data[:,1:5]
        f2.close()

        # Define nc-file group for each station & write data
        station = nc.createGroup(str(d)) 
        var1 = station.createVariable('var1','f4',('time',),fill_value=99.990,zlib=True,least_significant_digit=3)
        #           
        var1.units = 'm'
        var1[:] = data2[:,1]

# time variables 
# Calculate time from date format for netCDF and data 
# in column 1 of ascii file   
timedate         = datetime(int(fyr),int(fmon),int(fday),int(ftime),0)
timenum_calc     = date2num(timedate,units=times.units,calendar=times.calendar)
timenum_calc_new = [timenum_calc+data2[n,0] for n in range(data2.shape[0])]

# write data
hoursout[:] = data2[:,0]
times[:]    = timenum_calc_new

nc.close()

When looking at the output generated by this script the variables are saved in a "step function" with multiple values for one specific time point (Fig. 1: ncview output of variable hoursout when using calculated time in netCDF file. ) instead of in a smooth time series (Fig. 2: ncview output of variable hoursout with commented write statement "times[:] = timenum_calc_new".).

Any ideas on what is going wrong? As I said the same script works perfectly fine for similar ascii files but just doesn't want to work here.

Thanks in advance for the help!

Aucun commentaire:

Enregistrer un commentaire