I want to set the outlier values as 'NaN' values. Here is the code I am using right now. Can someone explain me ?

import numpy as np, matplotlib.pyplot as plt data = np.random.rand(1000)+5.0 plt.plot(data) plt.xlabel('observation number') plt.ylabel('recorded value') plt.show()

Here's an implementation for the N-dimensional case (from some code for a paper here: https://github.com/joferkington/oost_paper_code/blob/master/utilities.py):

There are a huge number of ways to test for outliers, and you should give some thought to how you classify them. Ideally, you should use a-priori information (e.g. "anything above/below this value is unrealistic because...")

code from http://eurekastatistics.com/using-the-median-absolute-deviation-to-find-outliers This uses the L1 distance instead of L2 distance, and has support for asymmetric distributions.

def doubleMADsfromMedian(y,thresh=3.5): # warning: this function does not check for NAs # nor does it address issues when # more than 50% of your data have identical values m = np.median(y) abs_dev = np.abs(y - m) left_mad = np.median(abs_dev[y <= m]) right_mad = np.median(abs_dev[y >= m]) y_mad = left_mad * np.ones(len(y)) y_mad[y > m] = right_mad modified_z_score = 0.6745 * abs_dev / y_mad modified_z_score[y == m] = 0 return modified_z_score > thresh

