Building a Time series prediction model on web login timestamp

0 votes

So I'm trying to build a time series prediction model. All I have is a sequence of timestamps of a user when he logs in to a site.

Here is the first few rows of the data. This is a Panda series I've got this in

0   2012-03-01 00:05:55
1   2012-03-01 00:06:23
2   2012-03-01 00:06:52
3   2012-03-01 00:11:23
4   2012-03-01 00:12:47
5   2012-03-01 00:12:54
6   2012-03-01 00:16:14
7   2012-03-01 00:17:31
8   2012-03-01 00:21:23
9   2012-03-01 00:21:26

Now the questions I have are;

1). How to Graph the user behavior on an hourly basis when all I have is timestamps and no Y values or any other features

2). Build a model which fits this time series and predict for the next two weeks.

Dec 7, 2018 in Data Analytics by Shubham
• 13,210 points
248 views

1 answer to this question.

0 votes

I had done something similar and ran into the same problem.

So, what I did was that I grouped the time series using epoch and loaded it into a dictionary. 

From there I could work on the time series in hour chunks. (data source is json) Then you can convert it to a panda DataFrame and chart directly using matplotlib. Since your data is already in panda, you could skip the data pull and edit the initial loop to process your raw data. I hope this helps.

for key in responseJson['All'].keys():
        t = time.strftime('%Y,%m,%d %H:00:00', time.gmtime(float(key) / 1000.0))
        h = responseJson['All'][key]
        word = t
        epochkey = int(time.mktime(time.strptime(t, '%Y,%m,%d %H:00:00')))

        if word not in dict:
            dict[word] = h
            epochdict[epochkey] = h
        else:
            dict[word] += h
            epochdict[epochkey] += h

Then I converted it to a panda DataFrame:

for row in epochdict:
        if(row[0] not in data):
            data[row[0]]={}
        data[str(row[0])][str(row[2])]=round(row[3],3)

            df=DataFrame(data).T.fillna(0)
answered Dec 7, 2018 by Upasana
• 8,470 points

Related Questions In Data Analytics

0 votes
1 answer

Building Random Forest on a data-set comprising of missing(NA) values

You have two options, either impute the ...READ MORE

answered Apr 2, 2018 in Data Analytics by Bharani
• 4,550 points
96 views
0 votes
1 answer

How to change y axis max in time series using R?

The axis limits are being set using ...READ MORE

answered Apr 3, 2018 in Data Analytics by darklord
• 6,140 points
69 views
0 votes
1 answer
0 votes
1 answer

Calculating accuracy of prediction of rpart model

Your first task would be to build ...READ MORE

answered Apr 4, 2018 in Data Analytics by Bharani
• 4,550 points
993 views
0 votes
1 answer

How to create dummy variables based on a categorical variable of lists in R?

You can use mtabulate in the following way: library(qdapTools) cbind(data[1], ...READ MORE

answered Apr 13, 2018 in Data Analytics by CodingByHeart77
• 3,680 points
405 views
0 votes
1 answer

Save a plot as image on the disk using R

Consider for both the situations: 1. Image will ...READ MORE

answered Apr 13, 2018 in Data Analytics by darklord
• 6,140 points
67 views
0 votes
1 answer

R lag irregular time series data

You could try using: library(dplyr) library(zoo) na.locf(ts$value[sapply(ts$time, function(x) min(which(ts$time - ...READ MORE

answered May 11, 2018 in Data Analytics by darklord
• 6,140 points
81 views
0 votes
1 answer

What is the difference between correlation and covariance?

Correlation and Co-variance both are used as ...READ MORE

answered Jul 24, 2018 in Data Analytics by ANMOL
• 3,620 points
1,572 views
0 votes
1 answer

How do I become a data scientist step by step?

I am assuming that you are a ...READ MORE

answered Jul 26, 2018 in Data Analytics by ANMOL
• 3,620 points
72 views