I have a java application that needs to read a large amount of data from MongoDB 3.2 and transfer it to Hadoop.
This batch application is run every 4 hours 6 times a day.
- Documents: 80000 at a time (every 4 hours)
- Size : 3gb
Currently I am using MongoTemplate and Morphia in order to access MongoDB. However I get an OOM exception when processing this data using the following :
List<MYClass> datalist = datasource.getCollection("mycollection").find().asList();
What is the best way to read this data and populate to Hadoop?
MongoTemplate::Stream()and write to Hadoop one by one?
batchSize(someLimit)and write the entire batch to Hadoop?
Cursor.batch()and write to hdfs one by one?