One of the main issues in the development of an adaptive dialogue system is to estimate a user's sentiment state in real time since the user's self-reported sentiment does not necessarily appear in the user utterances. We address this problem by proposing a new attention mechanism based on the time-series physiological signals and word sequences. Compared with the physiological model based on statistics, physiological LSTM models based on our proposed physiological signal processing method achieved higher performance. Moreover, we extend our physiological signal processing method to the Transformer language model and propose the Time-series Physiological Transformer (TPTr). Our proposed methods significantly outperformed the previous best result ($p<0.05$).