I just spent several hours trying to configure a psuedo-distributed Hadoop cluster inside a Docker container. I wanted to post our experience in case someone else makes the mistake of trying to do this themselves.
When we tried to save a file to HDFS with the Java client, the NameNode appeared to save the file. Using hdfs dfs ls we could see the file was represented in HDFS, but has a size of 0, indicating no data had made it into the cluster.
A really unhelpful stack trace was also issues by the client. The error was similar to this:
org.apache.hadoop.ipc.RemoteException(java.io.IOException):
File foo.csv could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) excluded in this operation.
This is just the stacktrace from Hadoop piped back to the client.