调优的方法很多,调整参数的话可以用网格搜索、随机搜索等,调整性能的话,可以根据具体的数据和场景进行具体分析。调优后再跑一边算法,看结果有没有提高,如果没有,找原因,数据 or 算法?是数据质量不好,还是特征问题还是算法问题。一个一个排查,找解决方法。特征问题就回到第三步再进行特征工程,数据质量问题就回到第一步看数据清洗有没有遗漏,异常值是否影响了算法的结果,算法问题就回到第四步,看算法流程中哪一步出了问题。
Outlier: you are enumerating meticulously everything you have. You found 3 dimes, 1 quarter and wow a 100 USD bill you had put there last time you bought some booze and had totally forgot there. The 100 USD bill is an outlier, as it's not commonly expected in a pocket.
Noise: you have just come back from that club and are pretty much wasted. You try to find some money to buy something to sober up, but you have trouble reading the figures correctly on the coins. You found 3 dimes, 1 quarter and wow a 100 USD bill. But in fact, you have mistaken the quarter for a dime: this mistake introduces noise in the data you have access to.
To put it otherwise, data = true signal + noise. Outliers are part of the data.