However, kicking the tires is a lot easier at the car dealership. In order to get something like this running, there are many hurdles to jump. It would no doubt be easier if I was running on Windows, or if I wanted to install to virtual box or via Docker or something like that. The manual installation process for Mac took me several hours to crack as the error messages do not make it clear what the problem is. It's not unusual for an early-stage (incubating) project to have documentation/installation issues. On the whole, they've done an outstanding job. But, in case anyone else out there is having similar problems, here are my fixes to the installation docs. For the Mac (or linux), the installation procedure I followed is at Installing Apache PredictionIO (incubating) on Linux / Mac OS X.
PredictionIO is based on Spark, as well as some combination of storage solutions. You can use Postgres, MySQL or a combination of HBase and ElasticSearch. I chose the latter as it appears to be the recommended solution.
After following the instructions, including editing the conf/pio-env.sh
file to specify my own existing Spark installation, by editing SPARK_HOME as follows:
SPARK_HOME=Applications/spark-1.6.1
However, the first problem I ran into resulted in the following cryptic error message (when checking the status via pio status):
/Applications/spark-1.6.1 is probably an Apache Spark development tree. Please make sure you are using at least 1.3.0.
It turns out that you need a version of Spark pre-built with Hadoop (I had built this one from source). Before realizing that was the problem, I simply downloaded their recommended version into the vendors directory (rather than trying to put it in my own area) and edited the pio-env.sh file:
SPARK_HOME=$PIO_HOME/vendors/spark-1.5.1-bin-hadoop2.6
[ERROR] [Console$] Unable to connect to all storage backends successfully. The following shows the error message from the storage backend.
[ERROR] [Console$] Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections. (org.postgresql.util.PSQLException)
PIO_STORAGE_REPOSITORIES_METADATA_NAME=predictionio_metadata
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=predictionio_eventdata
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
Restarting and checking the status generated this error:
[ERROR] [RecoverableZooKeeper] ZooKeeper exists failed after 1 attempts
This took me a while to figure out. I actually installed ZooKeeper, although it turns out that I didn't need to -- and in fact it's incorrect to have ZooKeeper running if you're simply running HBase locally. The problem turned out to be that I hadn't properly configured HBase in the vendors/hbase-1.0.0/conf/hbase-site.xml configuration file. Look carefully at the values that you will find here. They do not match your setup!
Once I had fixed that problem, and rechecked the status, I saw the following wonderful message:
[INFO] [Console$] Your system is all ready to go.
You can also check that all is well by running jps. You should see something like the following:
41109 Console
41014 Elasticsearch
41079 HMaster
41127 Jps
29769
If you don't see HMaster included, then HBase is not running correctly.
Now, it's time to clone a template. I'm interested in recommender systems so I navigated to the following set of instructions: which state in section 2. Create a new Engine from an Engine Template that you should run the following command:
pio template get PredictionIO/template-scala-parallel-recommendation MyRecommendation
However, that is (currently at least) incorrect. Instead, use the following:
pio template get PredictionIO/template-scala-parallel-universal-recommendation MyUniversalRecommendation
Once I figured out the correct mumbo-jumbo to clone the template, I was able to create my own recommendation engine. Now, I shall (hopefully) enjoy actually doing some implementation!
Stay tuned in case I have more tips.
Update 9/30/2016: there's one more thing you have to do, at least for the universal recommendation engine, and which is not spelled out in the installation instructions. Before you can import the sample data, you must install the Python predictionio package. Do this by entering the following:
If that doesn't work for you, then consult this github project: https://github.com/apache/incubator-predictionio-sdk-python
One more problem which arose during the training phase:
The solution this time is something that I never would have imagined in a million years, let alone solved! Here's the link to the issue. But the bottom line is, if you're running on a Mac, change the TCP/IP wifi network settings so that Configure IPv6 is manual, not automatic.
Update 9/30/2016: there's one more thing you have to do, at least for the universal recommendation engine, and which is not spelled out in the installation instructions. Before you can import the sample data, you must install the Python predictionio package. Do this by entering the following:
pip install prediction
One more problem which arose during the training phase:
[INFO] [Engine$] Data sanity check is on.
Exception in thread "main" java.lang.NumberFormatException: For input string: "558:feed::1"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
And another problem after deploying and trying to make a query:
[ERROR] [OneForOneStrategy] Cannot support TLS_RSA_WITH_AES_256_CBC_SHA with currently installed providers
It turned out, however, that this particular solution was necessary but not sufficient. In the same discussion there's a mention of the SSL problem that I ran into next:
[ERROR] [HttpServerConnection] Aborting encrypted connection to /0:0:0:0:0:0:0:1:52194 due to [SSLException:Unrecognized SSL message, plaintext connection?]
curl -kH "Content-Type: application/json" -d '
{
"user": "u1"
}' https://localhost:8000/queries.json
Recommendations for user: u1
{"itemScores":[{"item":"Galaxy","score":0.8880454897880554},{"item":"Nexus","score":0.24007925391197205},{"item":"Surface","score":0.043848853558301926}]}