How to store your clickstream data?
I would like to talk about clickstream data in this blog.
Why is it so important to store clickstreams?
There is a lot of answers to this question. In my opinion, the most important reason is you can understand better your visitor’s behavior and you can redesign your business model according to data.
Table of Contents:
- Which technologies we will use?
- Divolte Overview.
- Divolte Configuration
- Writing clickstream data to Kafka
Which technologies we will use to implement it?
Divolte:
Divolte is a scalable click stream platform. It provides a javascript api and rest api for client side and you can store your data on Kafka, HDFS or Google Big Data platform.
Kafka:
It is a distributed streaming platform and we will send the clickstream data from divolte to kafka directly.
2. Divolte Overview
Divolte is a collecting server for your clickstream data. You can store clickstream data directly HDFS or Kafka. It provides a javascript for the client-side.
Divolte uses Apache Avro serialization system to store the data.
Divolte provides ip2location support and user agent parsing. You can define your custom events in the configuration file.
3. Divolte Configuration
First, you can check out my sample code from github for my configuration.
You can find divolte configuration files in “data/divolte” folder.
You will see 3 different files:
- divolte-collector.conf: You can specify server configuration in this file.
a. sources: You can define different sources for events and define javascript file name, cookie name, cookie expires time…
b. mappings: You can map your sources to sinks. In my case, I mapped click_stream source to kafka.
c. global: Global server settings are stored in this section.
d. sinks: Sinks configurations are stored in this section. - eventRecord.avsc: This file stores the format of your data.
- mapping.groovy: Mapping your clickstream data to your avro file.
In my case, I customized some values for basic. You can configure it if you need it. I recommend this documentation:
You will find configured kafka in docker file.
If you are ready, we can test whether our clickstream data is going to kafka.
Build docker-compose file vi below command:
docker-compose up -d
You will see divolte homepage when you visit http://localhost:8290. Actually, you can disable this page from the configuration. Because you don’t need this welcome page. You just need javascript. I enable this page for demo:)
Sending clickstream event from divolte:
You can run the below command on the browser console to test:
divolte.signal('event', {"item_id": 10, "event_date":3213213})
4. Writing clickstream data to Kafka
Now we can connect to kafka server for checking data via this command:
docker-compose exec kafka /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic click_stream --from-beginning
You will see different chars in the message because of serialization, don’t worry about format.
Thanks for reading.