Abstract: We study the problem of similarity self-join and similarity join size estimation in a streaming setting where the goal is to estimate, in one scan of the input and with sublinear space in ...