-
-
Save Jeraldy/1aa6ae6fefa46b7a9cc02b6573cfeefe to your computer and use it in GitHub Desktop.
/** | |
* | |
* @author Deus Jeraldy | |
* @Email: [email protected] | |
* BSD License | |
*/ | |
// np.java -> https://gist.github.com/Jeraldy/7d4262db0536d27906b1e397662512bc | |
import java.util.Arrays; | |
public class NN { | |
public static void main(String[] args) { | |
double[][] X = {{0, 0}, {0, 1}, {1, 0}, {1, 1}}; | |
double[][] Y = {{0}, {1}, {1}, {0}}; | |
int m = 4; | |
int nodes = 400; | |
X = np.T(X); | |
Y = np.T(Y); | |
double[][] W1 = np.random(nodes, 2); | |
double[][] b1 = new double[nodes][m]; | |
double[][] W2 = np.random(1, nodes); | |
double[][] b2 = new double[1][m]; | |
for (int i = 0; i < 4000; i++) { | |
// Foward Prop | |
// LAYER 1 | |
double[][] Z1 = np.add(np.dot(W1, X), b1); | |
double[][] A1 = np.sigmoid(Z1); | |
//LAYER 2 | |
double[][] Z2 = np.add(np.dot(W2, A1), b2); | |
double[][] A2 = np.sigmoid(Z2); | |
double cost = np.cross_entropy(m, Y, A2); | |
//costs.getData().add(new XYChart.Data(i, cost)); | |
// Back Prop | |
//LAYER 2 | |
double[][] dZ2 = np.subtract(A2, Y); | |
double[][] dW2 = np.divide(np.dot(dZ2, np.T(A1)), m); | |
double[][] db2 = np.divide(dZ2, m); | |
//LAYER 1 | |
double[][] dZ1 = np.multiply(np.dot(np.T(W2), dZ2), np.subtract(1.0, np.power(A1, 2))); | |
double[][] dW1 = np.divide(np.dot(dZ1, np.T(X)), m); | |
double[][] db1 = np.divide(dZ1, m); | |
// G.D | |
W1 = np.subtract(W1, np.multiply(0.01, dW1)); | |
b1 = np.subtract(b1, np.multiply(0.01, db1)); | |
W2 = np.subtract(W2, np.multiply(0.01, dW2)); | |
b2 = np.subtract(b2, np.multiply(0.01, db2)); | |
if (i % 400 == 0) { | |
print("=============="); | |
print("Cost = " + cost); | |
print("Predictions = " + Arrays.deepToString(A2)); | |
} | |
} | |
} | |
} |
Thanks for spotting that.. I Used it to plot the losses on the chart.. you can just comment it.
For the case of np.print I imported it as "import static np.print;" so I can just call print("val")
I think
double[][] dZ1 = np.multiply(np.dot(np.T(W2), dZ2), np.subtract(1.0, np.power(A1, 2)));
might should be
double[][] dZ1 = np.multiply(np.dot(np.T(W2), dZ2), np.subtract(A1, np.power(A1, 2)));
theoretically?
Great article! Thanks!
It would be very useful to have meaningful names for variables instead of X, Y, m... it would make it much more readable for begginers like me. I will try to suggest with a Pull Request! (I hope in the near future)
Thanks
I also get dZ1 = dZ2 W2 (1-A1)A1
I did the derivates many times
hi
it help me
but how to test this network ?
Hi, when I try to test the NN, it misbehaves. I am preserving W1, b1, W2, and b2 as the NN weights and backprop. The As and Zs are recalculated. Here is the output with your original XOR training data and the test result I added in main().
I don't understand why the single test behaves differently.
Single pair of X1 X2 testing:
Expected Results
test X1 X2 Prediction
0 1 >.9
1 1 <.1
1 0 >.9
0 0 <.1
Actual Results
test X1 X2 Prediction Analysis
0 1 .01 Wrong
1 1 .02 Right
1 0 .02 Wrong
0 0 .02 Right
Output follows from the test for 0,1
run:
Cost = NaN
Predictions = [[1.0, 1.0, 1.0, 1.0]]
Cost = 0.2969025436010251
Predictions = [[0.29144265296508315, 0.8154787569733192, 0.6852817198123105, 0.22985828140028192]]
Cost = 0.16802804016304362
Predictions = [[0.1538408192545183, 0.8621884062632224, 0.8315669091608777, 0.15830654731289984]]
Cost = 0.11413292681227041
Predictions = [[0.100817252589133, 0.8976364304662805, 0.8876819432980427, 0.11585198272332121]]
Cost = 0.08493924982058358
Predictions = [[0.07375924489343984, 0.9207673719914042, 0.9167687321976236, 0.08943320369408177]]
Cost = 0.06689411340161885
Predictions = [[0.05758306011535834, 0.9362877197000213, 0.9344502020033382, 0.07192206109098881]]
Cost = 0.05476093241324496
Predictions = [[0.0469131616725125, 0.9471897271303298, 0.9462716036200162, 0.05965924778335137]]
Cost = 0.046106028390985966
Predictions = [[0.03938979283558176, 0.9551735726479308, 0.9546934657492366, 0.050680984981294905]]
Cost = 0.03965499406430109
Predictions = [[0.03382311552031198, 0.961227563955176, 0.960974113482648, 0.04386840008668915]]
Cost = 0.03468063619422532
Predictions = [[0.029551254900865218, 0.9659518219563465, 0.9658230501302526, 0.03854728631364289]]
test input=[[0.0], [1.0]]
Cost = 0.008072757222331885
Test Prediction = [[0.03177524030447454]]
BUILD SUCCESSFUL (total time: 1 second)
=======================================================================
Source Code
/*
- To change this license header, choose License Headers in Project Properties.
- To change this template file, choose Tools | Templates
- and open the template in the editor.
*/
package nn3;
import java.util.Arrays;
public class NN3 {
/**
*
- @author Deus Jeraldy
- @Email: [email protected]
- BSD License
*/
// np.java -> https://gist.github.com/Jeraldy/7d4262db0536d27906b1e397662512bc
public static void main(String[] args) {
double[][] W1;
double[][] b1;
double[][] W2;
double[][] b2;
double[][] X = {{0, 0}, {0, 1}, {1, 0}, {1, 1}};
double[][] Y = {{0}, {1}, {1}, {0}};
int m = 4;
int nodes = 400;
X = np.T(X);
Y = np.T(Y);
W1 = np.random(nodes, 2);
b1 = new double[nodes][m];
W2 = np.random(1, nodes);
b2 = new double[1][m];
for (int i = 0; i < 4000; i++) {
// Foward Prop
// LAYER 1
double[][] Z1 = np.add(np.dot(W1, X), b1);
double[][] A1 = np.sigmoid(Z1);
//LAYER 2
double[][] Z2 = np.add(np.dot(W2, A1), b2);
double[][] A2 = np.sigmoid(Z2);
double cost = np.cross_entropy(m, Y, A2);
//costs.getData().add(new XYChart.Data(i, cost));
// Back Prop
//LAYER 2
double[][] dZ2 = np.subtract(A2, Y);
double[][] dW2 = np.divide(np.dot(dZ2, np.T(A1)), m);
double[][] db2 = np.divide(dZ2, m);
//LAYER 1
double[][] dZ1 = np.multiply(np.dot(np.T(W2), dZ2), np.subtract(A1, np.power(A1, 2)));
// double[][] dZ1 = np.multiply(np.dot(np.T(W2), dZ2), np.subtract(1.0, np.power(A1, 2)));
double[][] dW1 = np.divide(np.dot(dZ1, np.T(X)), m);
double[][] db1 = np.divide(dZ1, m);
// G.D
W1 = np.subtract(W1, np.multiply(0.01, dW1));
b1 = np.subtract(b1, np.multiply(0.01, db1));
W2 = np.subtract(W2, np.multiply(0.01, dW2));
b2 = np.subtract(b2, np.multiply(0.01, db2));
if (i % 400 == 0) {
System.out.println("==============");
System.out.println("Cost = " + cost);
System.out.println("Predictions = " + Arrays.deepToString(A2));
}
} // end of training
// now to test
// X is the new input
double[][] tX = {{0,1}};
tX = np.T(tX);
System.out.println("\r\n");
System.out.println("test input="+Arrays.deepToString(tX));
// Forward Prop
// LAYER 1
double[][] tZ1 = np.add(np.dot(W1, tX), b1);
double[][] tA1 = np.sigmoid(tZ1);
//LAYER 2
double[][] tZ2 = np.add(np.dot(W2, tA1), b2);
double[][] tA2 = np.sigmoid(tZ2); // Prediction (Get Output here)
double cost = np.cross_entropy(m, Y, tA2);
//costs.getData().add(new XYChart.Data(i, cost));
System.out.println("==============");
System.out.println("Cost = " + cost);
System.out.println("Test Prediction = " + Arrays.deepToString(tA2));
}
}
However, what is happening, is that the NN hasn't learned what I thought it was going to learn. When I try four inputs, no matter what I give it, it outputs the learned pattern:
test input=[[1.0, 0.0, 1.0, 1.0], [0.0, 1.0, 1.0, 1.0]]
Cost = 0.02185745791739911
Test Prediction = [[0.003408446482782547, 0.9872463542830693, 0.9452257141301419, 0.014738669159170737]]
test input=[[1.0, 1.0, 0.0, 0.0], [0.0, 1.0, 0.0, 1.0]]
Cost = 0.03372416671312117
Test Prediction = [[0.002095908947609844, 0.9235729681325365, 0.998439571534914, 0.05041614843514693]]
So I think the NN has "learned" to output the static pattern from the training data rather than perform an XOR operation on a single pair.
When I test with 3 pairs, it outputs the first three static results of the training data:
test input=[[0.0, 0.0, 0.0], [1.0, 0.0, 0.0]]
Cost = 0.0018916002564269888
Test Prediction = [[0.00336438364815533, 0.9975507610865536, 0.9982574181545878]]
I can make each prediction wrong by inverting the test data from the training data.
It doesn't seem like the X inputs are making a difference, only the Y.
Hi - Saw your article on medium - thanks.
You didn't specify any dependency for
costs.getData().add(new XYChart.Data(i, cost));
Also "print" should be np.print I think?