General update, what I've been working on, and about Reddit

####Ahoy there, crypto enthusiasts! This week was another busy, productive week, in which I've been doing some research, had a setback, and I have added a feature to the website. Let's get started :) ###Sentiment analysis The week started with some research. I have a little CPU time left on my sync machine, because Twitter limits my API calls, and I thought that could be put to good use for some sentiment analysis. At my study, there was a course called "Web Analytics", in which we discussed ways of analyzing big fuzzy data, and I loved it. This was one of the few courses I signed up for, and with the rest of my group of peers, it was the only course that we finished with a 100% score. It was actually more than 100%, because of some bonus points we could get for assignments, but there weren't any ways of crediting us with more than maximum score. There were assignments about typical mathematical classifier algorithms and some assignments with more freedom to choose, say neural networks or other predictive models, and with some free time in the past weekend, it seemed like an interesting moment to use my knowledge. I have tried using two different algorithms for analyzing the sentiment of tweets, one was a Bayesian classifier, and one was a support vector machine. These algorithms are known to be fast and were an easy way to get started. Programming them is an easy task, and I was getting results after a few hours already. This was one of the best moments in my programming life, and I felt a bit magical. From then on, things went downwards. First of all, the data was bad. I was training my algorithms on a set of tweets that someone had gathered and manually labelled in 2011. [Link for those interested](https://github.com/karanluthra/twitter-sentiment-training), [original work by Niek J. Sanders](http://www.sananalytics.com/lab/twitter-sentiment/). It's a set of about 5500 tweets about technical companies, whereas general consensus says you need about 10.000 training tweets for moderately acceptable results. Also, cryptocurrency uses different keywords for positive and negative sentiment. A result was that about 95% of live tweets were labelled as neutral or irrelevant, and only a few were positive or negative. Not good enough. When running a test on live data, it was also apparent that classifying data was a lot of work for my tiny virtual machine, and that it would take a lot of work to improve that. A full scan of the top 400 market cap coins took way longer than 15 minutes, whereas staying under that is one of my requirements. Optimizing on any machine learning algorithms is very hard [as I've come to learn](https://en.wikipedia.org/wiki/No_free_lunch_in_search_and_optimization), and I've decided to not pursue this path any longer. ###Outage If you look at the charts for all coins, you see a few gaps in there. When developing I sent a development executable into the cloud that didn't save any data to the database. 😓 It took about a day before I saw it happened... For everybody using the website in that time, I'm sorry. I do notice that it doesn't really matter the data is missing. The big movements are in the week-month scale, which is interesting. Maybe I'll add some of those volume changes on the website in a later stadium. ###Reddit added Now for the real deal this week. I've started tracking Reddit volumes! This started out as a simple experiment, but the first days of data indicate a lot more than the Twitter volumes did. The data is better because: * Where tweets can have a bunch of hashtags, a subreddit post can be linked to a single specific coin. * We're not searching for common words, so all data is related to that specific currency It's worse because: * The numbers are smaller, and the daily changes are higher. This means that the data is better when averaging over a longer period. There aren't many spikes in the graphs, but a single measurement point says a lot less about activity levels. ####analysis First of all, the top 5 coins, ordered by Reddit activity in the last 24 hours, plus some others I deem interesting: * 1 Bitcoin (/r/bitcoin) 328 * 2 VertCoin (/r/vertcoin) 212 * 3 Groestlcoin (/r/groesticoin) 48 * 4 Ethereum (/r/ethereum) 46 * 5 BitConnect (/r/bitconnect) 36 * 8 Bitcoin Cash (r/bitcoincash) 30 * 9 Litecoin (/r/litecoin) 20 * 10 Ripple (/r/ripple) 19 This is interesting, right? Connections I gathered from this data: /r/bitcoin is very popular, six times as popular as /r/ethereum, but I believe a lot of people new to crypto will visit /r/bitcoin before they visit /r/cryptocurrency. Also, a lot of people who are interested in bitcoin are not very into the whole altcoin culture and stay with bitcoin/fiat trading or holding. Then there's ethereum at 46 Reddit posts. This is the biggest smart contract coin and requires a lot of community to develop a healthy state of being. It looks like a healthy subreddit, and it seems to be a better baseline for measurements than /r/bitcoin is. Then there are some very active subreddits, the ones for VertCoin and GroestIcoin. They are both coins that have had a major price increase in the past week. It's a shame my data doesn't go back before then, but the volumes on those coins are pretty high. Indicators of anything? Play around with the graphs a bit. The subreddits need to be added manually, so if you see a mistake or if a new subreddit needs to be added, send me a message :) Happy investing, Tane
©Tane van Wifferen - 2018