Santander Value Prediction Challenge - 2
Contact me
Email -> cugtyt@qq.com
GitHub -> Cugtyt@GitHub
尝试了神经网络:
# train_df已经去掉全为0的列和'target', 'ID'列
x_train, x_val, y_train, y_val = train_test_split(np.log1p(train_df), np.log1p(target), test_size=0.2)
# x_train, x_val, y_train, y_val = train_test_split(train_reduced, np.log1p(target), test_size=0.2)
model = Sequential([
Dense(100, input_dim=4735),
# Dense(100, input_dim=1008),
Dense(20),
Dense(1)
])
model.compile(optimizer='rmsprop', loss='mse')
model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=40)
np.sqrt(mean_squared_log_error(target, np.exp(model.predict(np.log1p(train_df)))))
结果为1.5多。
测试随机森林:
rnd_reg = RandomForestRegressor(n_estimators=100, n_jobs=-1)
scores_rnd = cross_val_score(rnd_reg, train_df, np.log1p(target), scoring=make_scorer(mean_squared_error), cv=10)
rmsle_rnd = np.sqrt(scores_rnd)
rmsle_rnd
> array([1.40631864, 1.39308046, 1.49271305, 1.45397351, 1.34825134,
1.38194798, 1.39683042, 1.45427724, 1.55157639, 1.41461448])
加入缩放没有得到提升,让数据取log1p,也没有得到提升。