dimanche 26 juin 2016

Basic softmax model implementation on 150x150 images

I'n my learning of tensorflow I've tried to adapt the basic softmax MNIST example to work on my own image set. It's aerial photographs of buildings and I want to classify them by roof type. There are 4 such classifications that can be made.

The simple (maybe naive) idea was to resize the images (since they're not all the same) and flatten them. Then change the tensor shapes in the code and run it. Of course it doesn't work though. First let me show you the code.

# Load csv Data
filenames = []
_answers = []
with open('/home/david/DSG/id_train.csv') as csvfile:
    csv_reader = csv.reader(csvfile, delimiter=',')
    for row in csv_reader:
        one_hot_vec = [0, 0, 0, 0]
        one_hot_vec[int(row[1])-1] = 1
        _answers.append(np.asarray(one_hot_vec))
        filenames.append("/home/david/DSG/roof_images/" + str(row[0]) + ".jpg")


sess = tf.InteractiveSession()

# Image Loading and processing
filename_q = tf.train.string_input_producer(filenames)
reader = tf.WholeFileReader()
key, value = reader.read(filename_q)
__img = tf.image.decode_jpeg(value, channels=1)
_img = tf.expand_dims(tf.image.convert_image_dtype(__img, tf.float32),0)
img = tf.image.resize_nearest_neighbor(_img, [150,150])

# Actual model
x = tf.placeholder(tf.float32, [None, 22500])
W = tf.Variable(tf.zeros([22500, 4]))
b = tf.Variable(tf.zeros([4]))
y = tf.nn.softmax(tf.matmul(x, W) + b)

# Training algorithm
y_ = tf.placeholder(tf.float32, [None, 4])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.clip_by_value(y,1e-10,1.0)), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# Evaluate model, this checks the results from the y (prediciton matrix) against the known answers (y_)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

coord = tf.train.Coordinator()
init_op = tf.initialize_all_variables()
sess.run(init_op)

threads = tf.train.start_queue_runners(sess=sess, coord=coord)

# Loads and process all the images, adding them to an array for later use
images = []
for i in range(8000):
    if i % 100 == 0:
        print("Processing Images " + str(100*(i+100)/8000) + "% complete")
    image = img.eval().flatten()
    images.append(image)

# Train our model
for i in range(80):
    print("Training the Model " + str(100*(i+1)/80) + "% complete")
    batchImages = images[i*100:((i+1)*100)]
    batchAnswers = answers[i*100:((i+1)*100)].astype(float)
    # Here's a debug line I put in to see what the numbers were
    print(sess.run(y, feed_dict={x: batchImages, y_: batchAnswers}))
    sess.run(train_step, feed_dict={x: batchImages, y_: batchAnswers})

coord.request_stop()
coord.join(threads)

As can be seen I print the y values from softmax as I'm going along. The result is tensors that exclusively look like this [0., 0., 0., 1.]. I thought this was pretty strange. So I printed the value of tf.matmul(x, W) + b.

The result was this:

[[-236.86216736 -272.89904785   59.67744446  450.08377075]
 [-327.19482422 -384.06918335   87.47353363  623.79052734]
 [-230.79460144 -264.78787231   60.29759598  435.28485107]
 [-188.10324097 -212.30155945   53.8230629   346.58175659]
 [-180.26617432 -209.45767212   48.90292358  340.82092285]
 [-177.13232422 -200.59474182   45.97179413  331.75531006]
 [-225.94104004 -258.97390747   61.54353333  423.37136841]
 [-259.33599854 -290.73773193   67.69062042  482.38308716]
 [-151.53468323 -174.09906006   39.97481537  285.65893555]
 [-237.23356628 -272.71789551   65.12500763  444.82647705]
 ..... you get the idea
 [-195.14971924 -221.30851746   53.09790802  363.36032104]
 [-157.30508423 -175.47320557   40.4044342   292.37384033]
 [-178.94332886 -203.36262512   47.0838356   335.22219849]
 [-180.61688232 -200.0609436    45.12242508  335.55541992]
 [-145.7559967  -163.06838989   35.25980377  273.56466675]
 [-194.07254028 -213.78709412   53.14990997  354.70977783]
 [-191.92044067 -219.13395691   49.84062958  361.21377563]]

For the first second and third elements calculating softmax manually you get numbers of the order of E-200, essentially zero. And then a number above 1 for the fourth element. Since the all follow this pattern clearly something is wrong.

Now I've checked the input's, I have my answers as one hot vectors like so [0, 1, 0, 0] and my images are flattened and the values normalized to 0 and 1 (floats). Just like the MNIST example.

I also noticed that in the MNIST example the values from matmul are much smaller. Of the order of E0. Is that because there is 784 elements on each image, as opposed to 22500? Is this the cause of the problem?

Heck maybe this will never work for some reason. I need some help.

EDIT: I decided to check if the image size was having any effect, and sure enough the matmul does give smaller numbers. However they still exhibit a pattern and so I ran it through softmax again and got this output:

[[  2.12474524e-20   1.00000000e+00   1.10456488e-18   0.00000000e+00]
 [  3.22400550e-21   1.00000000e+00   1.24568592e-19   0.00000000e+00]
 [  2.49283055e-28   1.00000000e+00   6.52334536e-26   0.00000000e+00]
 [  4.73190862e-23   1.00000000e+00   3.71980738e-21   0.00000000e+00]
 [  1.11151765e-26   1.00000000e+00   4.14652626e-24   0.00000000e+00]
 [  2.23096276e-22   1.00000000e+00   7.21511359e-21   0.00000000e+00]
 [  1.41888640e-23   1.00000000e+00   2.13637447e-21   0.00000000e+00]
 [  3.55662848e-17   1.00000000e+00   5.14018079e-16   4.06785808e-33]
 [  8.25783417e-26   1.00000000e+00   2.95267040e-23   0.00000000e+00]
 [  1.09395607e-25   1.00000000e+00   3.76775998e-23   0.00000000e+00]
 [  9.34879669e-13   1.00000000e+00   1.07488766e-11   7.21446627e-25]
 [  3.09687017e-34   1.00000000e+00   5.22547065e-31   0.00000000e+00]
 [  2.10362117e-22   1.00000000e+00   1.31067148e-20   0.00000000e+00]
 [  5.86830220e-23   1.00000000e+00   9.55902033e-21   0.00000000e+00]
 [  9.59656235e-17   1.00000000e+00   2.98987045e-15   7.10348533e-32]
 [  2.33712669e-16   1.00000000e+00   3.26934410e-15   1.55066807e-31]
 [  1.09302052e-27   1.00000000e+00   5.34793657e-25   0.00000000e+00]
 [  1.67101349e-25   1.00000000e+00   1.15098012e-22   0.00000000e+00]
 [  4.46111042e-26   1.00000000e+00   1.23599421e-23   0.00000000e+00]
 [  1.31791856e-24   1.00000000e+00   2.25831162e-22   0.00000000e+00]
 [  2.19408324e-12   1.00000000e+00   5.67631081e-11   1.22608556e-23]]

Something else must be wrong then.

Aucun commentaire:

Enregistrer un commentaire