Cheat detection
Whenever a challenge is solved, a cheatScore
between 0 and 1 is
internally assigned to the solution. This indicates how likely the
challenge solution has been achieved with cheating.
Cheat score calculation
The calculation curently relies only on the time difference between current and previous solve in relation to the difficulty of the current challenge. The cheat score also factors in if hints and/or tutorials are enabled or disabled on the Score Board.
Challenge Difficulty | Minimum solve time | w/o hints | w/ tutorial |
---|---|---|---|
⭐ |
2 minutes |
+1 min |
÷2 |
⭐⭐ |
4 minutes |
+2 min |
÷2 |
⭐⭐⭐ |
6 minutes |
+3 min |
÷2 |
⭐⭐⭐⭐ |
8 minutes |
+4 min |
n/a |
⭐⭐⭐⭐⭐ |
10 minutes |
+5 min |
n/a |
⭐⭐⭐⭐⭐⭐ |
12 minutes |
+6 min |
n/a |
The underlying formula assumes that a non-cheating user requires a certain absolute minimum amount of time to solve hacking challenges. It is important to note, that this does not imply that you are expected to only need this minimum minutes for a challenge of certain difficulty.
Coupled challenges
The cheat scoring takes into account that some challenges will be solved in the same HTTP request, for example:
-
logging in the admin user with his weak password solves logging in the admin by any means (e.g. SQL Injection), too
-
both XXE challenge automatically solve using a deprecated B2B interface
-
the generic null byte challenge is typically solved along with the first actual exploit to access some sensitive file from
/ftp
To avoid false positive cheat scoring, the second of two coupled challenge solves will never count as cheating when they happen in sequence.
Trivial challenges
Some challenges are so frequently solved by accident or coincident, that it would be unfair to take them into account for cheat scoring at all. This includes:
-
triggering any kind of error that is improperly handled by the application, which if often solved as a by-product of solving other challenges
-
reading the privacy policy of the shop, which is merely a challenge making fun of the fact that almost nobody reads those in real life
Expected preceding interactions
While some challenges offer different attack paths, others have only one. Solving challenges of the latter type often involves some exploration of the web client, the API or other server-side endpoints. URLs which are very likely to have been visited before a specific challenge will be honestly solved, are tracked and counted by the cheat detection. If no or only parts of these preceding interactions took place before a challenge was solved, the cheat score is increased by half the percentage of these expected interactions which did not happen.
Coding challenges
For coding challenges the cheat score is also calculated based on expected solving time since the previous solved hacking or coding challenge. As the difficulty of a hacking challenge does not necessarily correlate with its associated coding challenge, the cheat score formulas are based on different criteria.
"Find It" cheat score calculation
The cheat score to find the vulnerable line(s) of code in the given code snippet is based on the length of that code snippet and the number of lines that need to be selected as the correct answer.
Snippet length | Minimum solve time | # vulnerable lines |
---|---|---|
up to 1000 characters |
1 minutes |
×# |
up to 2000 characters |
2 minutes |
×# |
up to 3000 characters |
3 minutes |
×# |
up to 4000 characters |
4 minutes |
×# |
etc. |
For the Score Board coding challenge the expected solve time is reduced by 50% if the Hacking Instructor is enabled, as there is a Coding Challenges tutorial attached to it.
"Fix It" cheat score calculation
The cheat score to detect the right fix for an identified vulnerability depends only on the number of provided fix choices.
Provided choices | Minimum solve time |
---|---|
3 |
1 minute |
4 |
2 minutes |
5 |
2 minutes |
6 |
3 minutes |
Please note that Juice Shop does not allow coding challenges with less than 3 fix options to choose from.
Total cheat score
The server also keeps track of the median cheatScore
across all
solved challenges in the totalCheatScore
which is available via the
juiceshop_cheat_score
metric but
also sent in each
Challenge solution webhook
call. The totalCheatScore
value is not persisted across server
restarts, but its calculation is also not irritated by
automatic
or
manual restoring of hacking progress.
An example of the totalCheatScore
metric can be seen below:
# HELP juiceshop_cheat_score Overall probability that any hacking or coding challenges were solved by cheating.
# TYPE juiceshop_cheat_score gauge
juiceshop_cheat_score{app="juiceshop"} 0.1201909090909091
The following values for totalCheatScore
were measured during
activities that are
definitely considered cheating while
solving the available hacking challenges:
-
>99.9% after executing all Integration tests in <1 minute on the author’s Windows 10 laptop
-
>99.3% after executing all End-to-end tests in <10 minutes during GitHub CI/CD on Ubuntu
Limitations
The cheat scoring assumes that a single user is hacking the Juice Shop instance. If the application is used by a team, the values need to be considered less reliable, as extra solve speed might come from parallelization of challenges across team members. Similarly, experienced Juice Shop users will also solve challenges faster than a new user, so their speed is likely to trigger cheat detection as well.
If the Juice Shop instance is under the control of the user, any cheat score it reports via Prometheus or Webhook cannot be trusted at all.
All in all, the cheat score should never blindly be used as a tool to caution or sanction somebody. Vice versa a low score should also never blindly be used to determine monetary rewards etc.