Low latency detection of human-machine interactions is an important problem. This work proposes faster detection of gestures using a combination of temporal features learnt on block time input and those learnt by contextual information. The results are reported on a standard in-car hand gesture classification challenge dataset. The recurrent neural networks which learn sequential contexts are combined with 3D convolutional neural networks(C3D). We have demonstrated that a design similar to various multi-column networks, which have been successful for image classification and understanding can also improve classification performance on varying length time series. Therefore, a combination of 3D-Convolutional Neural Networks (CNN) and Long-Short-Term Memory (LSTM) is utilized for classification of hand gestures.On the task of task of early hand gesture classification, the proposed model outperforms the the C3D model which reports best results on full gestures. On experiments with incomplete hand gestures of half sequences the proposed combination performs better by a large margin. It is second best and only slightly less accurate than the best performing method, on the full gesture length.