mardi 14 juin 2016

Deleting rows of data for multiple variables

I have over 500 files that I cleaned up using a pandas data frame, and read in later as a matrix. I now want to delete missing rows of data from multiple variables for the entirety of my files. Each variable is pretty lengthy for its shape, for example, tc and wspd have the shape (84479, 558) and pressure has the shape (558,). I have tried the following example before and has worked in the past for single dimensional arrays with the same shape, but will no longer work with a two dimensional array.

    bad=[]
    for i in range(len(p)):
        if p[i]==-9999 or tc[i]==-9999:
            bad.append(i)
    p=numpy.delete(p, bad)
    tc=numpy.delete(tc, bad)

I tried using the following code instead but with no success (unfortunately).

import numpy as n 
import pandas as pd

wspd=pd.read_pickle('/home/wspd').as_matrix()
tc=pd.read_pickle('/home/tc').as_matrix()

press=n.load('/home/file1.npz')
p=press['press']
names=press['names']

length=n.arange(0,84479)
for i in range(len(names[0])): #using the first one as a trial to run faster
    print i #used later to see how far we have come in the 558 files
    bad=[]
    for j in range(len(length)):
        if (wspd[j,i]==n.nan or tc[j,i]==n.nan):
            bad.append(j)
        print bad

From there I plan on deleting missing data as I had done previously except indexing which dimension I am deleting from within my first forloop.

     new_tc=n.delete(tc[j,:], bad)

Unfortunately, this has not worked. I have also tried masking the array which also has not worked.

The reason I need to delete the data is my next library does not understand nan values, it requires strictly integers, floats, etc.

I am open to new methods for removing rows of data if anyone has any guidance. I greatly appreciate it.

Aucun commentaire:

Enregistrer un commentaire